## Table of Contents

1. Data and Sampling

1.1 What is data?

1.2 What is statistics?

1.3 Observational studies and experiments

1.4 Surveys and sampling methods

2. Data Visualization

2.1 What is data visualization?

2.2 Python for data visualization

2.3 Data frames

2.4 Bar charts

2.5 Pie charts

2.6 Scatter plots

2.7 Line charts

2.8 Data visualization example

3. Descriptive Statistics

3.1 Measures of center

3.2 Measures of variability

3.3 Box plots

3.4 Histograms

3.5 Violin plots

4. Probability and Counting

4.1 Introduction to probability

4.2 Addition rule and complements

4.3 Multiplication rule and independence

4.4 Conditional probability

4.5 Bayes’ Theorem

4.6 Combinations and permutations

5. Probability Distributions

5.1 Introduction to random variables

5.2 Properties of discrete probability distributions

5.3 Binomial distribution

5.4 Hypergeometric distribution

5.5 Poisson distribution

5.6 Properties of continuous probability distributions

5.7 Normal distribution

5.8 Studentâ€™s t-Distribution

5.9 F-distribution

5.10 Chi-square distribution

6. Inferential Statistics

6.1 Confidence intervals

6.2 Confidence intervals for population means

6.3 Confidence intervals for population proportions

6.4 Hypothesis testing

6.5 Hypothesis test for a population mean

6.6 Hypothesis test for a population proportion

6.7 Hypothesis test for the difference between two population means

6.8 Hypothesis test for the difference between two population proportions

6.9 One-way analysis of variance (one-way ANOVA)

7. Chi-square Tests for Categorical Data

7.1 Categorical data

7.2 Fisherâ€™s exact test

7.3 Introduction to chi-square tests

7.4 Chi-square test for homogeneity and independence

7.5 Relative risk and odds ratios

8. Linear Regression

8.1 Introduction to simple linear regression (SLR)

8.2 SLR assumptions

8.3 Correlation and coefficient of determination

8.4 Interpreting SLR models

8.5 Confidence and prediction intervals for SLR models

8.6 Testing SLR parameters

8.7 Linear regression example

9. Multiple Linear Regression

9.1 Introduction to multiple regression

9.2 Multiple regression assumptions and diagnostics

9.3 Coefficient of multiple determination

9.4 Multicollinearity

9.5 Interpreting multiple regression models

9.6 Confidence and prediction intervals for MLR models

9.7 Testing multiple regression parameters

9.8 Multiple regression example

10. Higher Order Regression

10.1 Categorical predictor variables

10.2 Interaction terms

10.3 Quadratic models

10.4 Complete second order models

10.5 Comparing nested models: F-test

10.6 Higher order models

11. Logistic Regression

11.1 Introduction to logistic regression (LR)

11.2 Estimating LR parameters

11.3 LR models with multiple predictors

11.4 LR assumptions and diagnostics

11.5 Testing LR parameters

11.6 Interpreting LR models

11.7 Comparing nested models: Likelihood ratio tests and AIC

11.8 Classification using LR models

12. Transformations

12.1 Logarithmic transformations

12.2 Ladder of powers

12.3 Box-Cox transformation

13. Stepwise Regression

13.1 Introduction to stepwise regression

13.2 Forward selection

13.3 Backward selection

13.4 Stepwise selection

14. Non-parametric Analysis

14.1 Parametric vs. nonparametric statistics

14.2 Resampling: Randomization and bootstrapping

14.3 Wilcoxon rank-sum test

14.4 Kruskal-Wallis test

14.5 Multiple tests

15. Introduction to Data Mining

15.1 What is data mining?

15.2 Data formats

15.3 Machine learning methods

15.4 scikit-learn

16. Data Cleansing and Preparation

16.1 What is data cleansing?

16.2 Handling missing values

16.3 Outliers

16.4 Standardization and normalization

16.5 Dimensionality reduction

16.6 Training, validation, and test sets

17. Supervised Learning

17.1 k nearest neighbors

17.2 Logistic regression

17.3 Evaluating classification models

17.4 Supervised learning examples

18. Unsupervised Learning

18.1 Clustering methods

18.2 Association rules

18.3 Evaluating clustering models

18.4 Unsupervised learning examples

19. Decision Tree Learning

19.1 Introduction to decision trees

19.2 Classification and regression trees (CART)

19.3 ID3 and C4.5 algorithms

19.4 Classification tree example

19.5 Regression tree example

19.6 Random forests

20. Principal Component Analysis

20.1 Introduction to principal component analysis (PCA)

20.2 Calculating principal components for two variables

20.3 Extending PCA to more variables

20.4 Determining the number of components

20.5 Interpreting principal components

21. Time Series

21.1 What is a time series?

21.2 Time series patterns and stationarity

21.3 Moving average and exponential smoothing forecasting

21.4 Forecasting using regression

22. Monte Carlo Methods

22.1 What is a Monte Carlo simulation?

22.2 Building simulations

22.3 Optimization and forecasting

22.4 What-if analysis

22.5 Advanced simulations

23. Ethics

23.1 Misleading statistics

23.2 Abuse of the p-value

23.3 Data privacy

23.4 Ethical guidelines

24. Appendix A: Distribution tables

14.1 t-distribution table

14.2 z-distribution table

14.3 Chi-squared distribution table

25. Appendix B: CSV Files

25.1 Data sets

## Teach applied statistics through a powerful interactive approach that includes programming using Jupyter Notebooks

**Applied Statistics with Data Analytics (Python)** focuses on statistical concepts and techniques used in data analysis. Important Python libraries are introduced to visualize data, perform statistical inference, and make predictions.

- Packed with interactive animations, questions and learning activities to help students master the material
- Covers elementary statistical concepts, modeling relationships between two or more variables, and advanced topics such as time series and Monte-Carlo methods
- Data analytics and data mining techniques such as logistic regression, clustering, and decision trees are also covered
- Built-in Python environment and Jupyter Notebooks allows students to experiment with real-world data sets

## What is a zyBook?

**Applied Statistics with Data Analytics (Python) **is a web-native, interactive zyBook that helps students visualize concepts to learn faster and more effectively than with a traditional textbook. (Check out our research.)

Since 2012, over 1,700 academic institutions have adopted web-native zyBooks to transform their STEM education.

### zyBooks benefit students and instructors:

- Instructor benefits
- Customize your course by reorganizing existing content or adding your own
- Continuous publication model automatically updates your course with the latest content and technologies
- Robust reporting gives you insight into studentsâ€™ progress, reading and participation
- Save time with auto-graded labs and challenge activities that seamlessly integrate with your LMS gradebook

- Student benefits
- Learning questions and other content serve as an interactive form of reading
- Instant feedback on labs and homework
- Concepts come to life through extensive animations embedded into the interactive content
- Save chapters as PDFs to reference the material at any time

## Senior Contributors

**Heather Berrier**

*Content Developer, Mathematics / PhD Physics and Astronomy, Univ. of California, Irvine*

**Joel Berrier**

*Assistant Professor, Dept. of Physics and Astronomy, Univ. of Nebraska, Kearney / PhD Physics and Astronomy, UC Irvine*

**Chris Chan**

*Director, Content Development / MA Mathematics, San Francisco State Univ.*

**Scott Nestler**

*Associate Teaching Professor, Mendoza College of Business, Univ. of Notre Dame / PhD Management Science, Univ. of Maryland, College Park*

**Iain Pardoe**

*Mathematics and Statistics Instructor, Thompson Rivers Univ., Pennsylvania State Univ., and Statistics.com / PhD Statistics, Univ. of Minnesota*

**Rodney X. Sturdivant**

*Professor, Dept. of Mathematics and Physics, Azusa Pacific Univ. / PhD Biostatistics, Univ. ofÂ Massachusetts, Amherst*

**Krista Watts**

*Assistant Professor, Directorâ€”Center for Data Analysis and Statistics, United States Military Academy, West Point / PhD Biostatistics, Harvard*