## Table of Contents

1. Data Visualization

1.1 What is data?

1.2 What is data visualization?

1.3 Python for data visualization

1.4 Data frames

1.5 Bar charts

1.6 Pie charts

1.7 Scatter plots

1.8 Line charts

2. Descriptive Statistics

2.1 Survey sampling

2.2 Measures of center

2.3 Measures of spread

2.4 Box plots

2.5 Histograms

2.6 Violin plots

3. Probability and Counting

3.1 Introduction to probability

3.2 Addition rule and complements

3.3 Multiplication rule and independence

3.4 Conditional probability and Bayes’ Theorem

3.5 Combinations and permutations

4. Probability Distributions

4.1 Introduction to random variables

4.2 Properties of discrete probability distributions

4.3 Binomial distribution

4.4 Hypergeometric distribution

4.5 Poisson distribution

4.6 Properties of continuous probability distributions

4.7 Normal distributiobn

4.8 Student’s t-distribution

4.9 F-distribution

4.10 Chi-square distribution

5. Inferential Statistics

5.1 Confidence intervals

5.2 Confidence intervals for population means

5.3 Confidence intervals for population proportions

5.4 Hypothesis tests

5.5 One-sample hypothesis tests for population means

5.6 One-sample z-test for population proportions

5.7 Two-sample hypothesis tests for population means

5.8 Two-sample z-test for population proportions

5.9 Analysis of variance (ANOVA)

5.10 Chi-square tests for categorical variables

6. Linear Regression

6.1 Simple linear regression (SLR)

6.2 SLR assumptions

6.3 Correlation coefficient and coefficient of determination

6.4 Interpreting linear models

6.5 Testing SLR parameters

6.6 Multiple regression

6.7 Categorical predictors and non-linear relationships

7. Time Series Analysis

7.1 What is a time series?

7.2 Time series patterns and stationarity

7.3 Moving average and exponential smoothing forecasting

7.4 Forecasting using regression

8. Monte Carlo Methods

8.1 What is a Monte Carlo simulation?

8.2 Building simulations

8.3 Optimization and forecasting

9. Data Mining

9.1 What is data mining?

9.2 Data preparation

9.3 Model evaluation

9.4 Supervised learning

9.5 Unsupervised learning

10. Ethics

10.1 Misleading statistics

10.2 Abuse of the p-value

10.3 Data privacy

10.4 Ethical guidelines

11. Appendix

11.1 z-distribution table

11.2 t-distribution table

11.3 Chi-squared distribution table

## What You’ll Find In This zyBook:

### More action with less text.

- An exceptionally student-focused introduction to data analytics
- Traditionally-hard topics are made learnable via hundreds of animations and learning questions
- Included statistics/probability background enables all students to succeed
- R coding practice are provided throughout to allow students to experiment
- Commonly combined with “Statistics for Data Analytics“; numerous configurations possible

## The zyBooks Approach

### Less text doesn’t mean less learning.

Data analytics is one of the fastest growing subjects today. Techniques in data analysis can help solve various problems such as identifying new opportunities to generate profit or improving health outcomes in hospitals. Since the subject relies heavily on statistics, the topic often pose difficulties for students. This zyBook represents entirely new material created specifically to help students master the subject. Written natively for the modern web, the zyBook uses less text, and teaches through hundreds of animations and learning questions.

The zyBook provides a solid background in probability and statistics needed to understand and apply techniques covered in later chapters such as time series analysis, Monte Carlo simulation, and data mining. A chapter on ethics provides real examples and encourages professionalism and safety.

In recent years, Python has gained ground a popular language among data analysts, researchers, and statisticians because of the language’s clean syntax and popularity among software developers. Links to a live coding environment are provided to allow students to practice writing python functions for data visualization, inferential statistics, linear regression, and other algorithms.