Exploratory data analysis

After the data is collected, we perform exploratory data analysis using simple visualization methods to gain a better understanding of the data that we will be working with during the analytical process.

In this critical thinking assignment, you will use the results from the preliminary data analysis that you completed in the previous week and use R and Tableau to investigate the variables of interest to the research questions.

R can be downloaded free from the closest mirror servers in Turkey: either https://cran.pau.edu.tr/ or https://cran.ncc.metu.edu.tr/.

Here is the list of all mirrors: https://cran.r-project.org/mirrors.html

Tableau for Students can be downloaded free here: https://www.tableau.com/academic/students.

Students will need to apply for a student license (and extend when it expires): https://www.tableau.com/academic/students#form


Continuing with the scenario you selected for your assignment last week—Titanic or Telecom Customer Churn—complete the following requirements. You will be utilizing some of your previous work in this assignment.


Part I: Contextualization

In the Introductory section of your report, provide a contextualization of the dataset by doing the following:

  1. Specify the selected dataset.
  2. Summarize the three research questions you have constructed.
  3. Summarize the findings from the preliminary data analysis.
  4. For each research question, identify the target (dependent) and predictor (independent) variables.

Part II: Statistical Description

  1. Write an R script that uses the summary() method to generate the statistical descriptions of the variables in the data subset. Capture the outcome from the R code.

Part III: Univariate Analysis

  1. Write an R script to count the different values of each relevant variable (column) in the dataset(s).
  2. Use the counts calculated for task prompt E to generate a histogram for each variable in Tableau.
  3. Summarize the findings from the inspection of the histograms that support the univariate distribution analyses.

Part IV: Bivariate Analysis

  1. Select appropriate visualization formats to depict the relationship between the target and the predictor variables.
  2. Summarize the observations from the visual inspections of the visualizations completed in task (a).

Part V: Outliers

  1. Describe the outliers in the dataset(s). Provide the calculations used to identify them using any tool and method of choice.

Part VI: Reporting

  1. Write a detailed APA-formatted report that summarizes the findings from the Exploratory Data Analysis for an audience of data analyst peers. Include details on the methodology and tools and support the findings with appropriate visualizations and screenshots of outputs from the use of the software tools.

Please respond to each of the task prompts organized by the Parts of the assignment. Your work for parts I-V should be total of 3-4 pages (approximately 1 page per part).

The report required for task prompt VI is a stand-alone document that will organize some of the elements of the responses to the other task prompts in a professional report that summarizes your actions, code, and findings and should not exceed 3 pages.

You will upload a zipped file that includes the responses to the task prompts your report, and all supporting files, including the R code and Tableau files.

Use Saudi Electronic University academic writing standards and APA style guidelines, citing at least two references in support of your work, in addition to your text and assigned readings. Include a title and reference page.

You are strongly encouraged to submit all assignments to the Turnitin Originality Check prior to submitting them to your instructor for grading. If you are unsure how to submit an assignment to the Originality Check tool, review the Turnitin Originality Check Student Guide.

