## Table of Contents

1. Data Visualization

1.1 What is data?

1.2 What is data visualization?

1.3 R for data visualization

1.4 Data frames

1.5 Bar charts

1.6 Pie charts

1.7 Scatter plots

1.8 Line charts

2. Descriptive Statistics

2.1 Survey sampling

2.2 Measures of center

2.3 Measures of spread

2.4 Box plots

2.5 Histograms

2.6 Violin plots

3. Probability and Counting

3.1 Introduction to probability

3.2 Addition rule and complements

3.3 Multiplication rule and independence

3.4 Conditional probability and Bayes’ Theorem

3.5 Combinations and permutations

4. Probability Distributions

4.1 Introduction to random variables

4.2 Properties of discrete probability distributions

4.3 Binomial distribution

4.4 Hypergeometric distribution

4.5 Poisson distribution

4.6 Properties of continuous probability distributions

4.7 Normal distributiobn

4.8 Student’s t-distribution

4.9 F-distribution

4.10 Chi-square distribution

5. Inferential Statistics

5.1 Confidence intervals

5.2 Confidence intervals for population means

5.3 Confidence intervals for population proportions

5.4 Hypothesis tests

5.5 One-sample hypothesis tests for population means

5.6 One-sample z-test for population proportions

5.7 Two-sample hypothesis tests for population means

5.8 Two-sample z-test for population proportions

5.9 Analysis of variance (ANOVA)

5.10 Chi-square tests for categorical variables

6. Linear Regression

6.1 Simple linear regression (SLR)

6.2 SLR assumptions

6.3 Correlation coefficient and coefficient of determination

6.4 Interpreting linear models

6.5 Testing SLR parameters

6.6 Multiple regression

6.7 Categorical predictors and non-linear relationships

7. Time Series Analysis

7.1 What is a time series?

7.2 Time series patterns and stationarity

7.3 Moving average and exponential smoothing forecasting

7.4 Forecasting using regression

8. Monte Carlo Methods

8.1 What is a Monte Carlo simulation?

8.2 Building simulations

8.3 Optimization and forecasting

9. Data Mining

9.1 What is data mining?

9.2 Data preparation

9.3 Model evaluation

9.4 Supervised learning

9.5 Unsupervised learning

10. Ethics

10.1 Misleading statistics

10.2 Abuse of the p-value

10.3 Data privacy

10.4 Ethical guidelines

11. Appendix

11.1 z-distribution table

11.2 t-distribution table

11.3 Chi-squared distribution table

12. Additional Material

12.1 Tables

12.2 Spreadsheets

12.3 Spreadsheet plotting

12.4 Dot plots

12.5 Animations

12.6 Data visualization: Case study

12.7 Dashboards

12.8 Linear regression example

12.9 What-if analysis

12.10 Advanced simulations

## What You’ll Find In This zyBook:

### More action with less text.

- An exceptionally student-focused introduction to data analytics
- Traditionally-hard topics are made learnable via hundreds of animations and learning questions
- Included statistics/probability background enables all students to succeed
- Commonly combined with “Statistics for Data Analytics“; numerous configurations possible

## The zyBooks Approach

### Less text doesn’t mean less learning.

Data analytics is one of the fastest growing subjects today. Techniques in data analysis can help solve various problems such as identifying new opportunities to generate profit or improving health outcomes in hospitals. Since the subject relies heavily on statistics, the topic often pose difficulties for students. This zyBook represents entirely new material created specifically to help students master the subject. Written natively for the modern web, the zyBook uses less text, and teaches through hundreds of animations and learning questions.

The zyBook provides a solid background in probability and statistics needed to understand and apply techniques covered in later chapters such as time series analysis, Monte Carlo simulation, and data mining. A chapter on ethics provides real examples and encourages professionalism and safety.

In recent years, R has become popular among data analysts, researchers, and statisticians because of the wide variety of statistical computing and graphical modeling packages available in the language. Links to a live coding environment are provided to allow students to practice writing R functions for data visualization, inferential statistics, linear regression, and other algorithms.