## Table of Contents

1. Data Visualization

1.1 What is data?

1.2 What is data visualization?

1.3 Python for data visualization

1.4 Data frames

1.5 Bar charts

1.6 Pie charts

1.7 Scatter plots

1.8 Line charts

1.9 Data visualization example

2. Descriptive Statistics

2.1 What is statistics?

2.2 Measures of center

2.3 Measures of variability

2.4 Box plots

2.5 Histograms

3. Probability and Counting

3. Probability and Counting

3.1 Introduction to probability

3.2 Addition rule and complements

3.3 Multiplication rule and independence

3.4 Conditional probability

3.5 Bayes’ Theorem

3.6 Combinations and permutations

4. Probability Distributions

4.1 Introduction to random variables

4.2 Properties of discrete probability distributions

4.3 Binomial distribution

4.4 Hypergeometric distribution

4.5 Poisson distribution

4.6 Properties of continuous probability distributions

4.7 Normal distribution

4.8 Student’s t-Distribution

4.9 F-distribution

4.10 Chi-square distribution

5. Inferential Statistics

5.1 Confidence intervals

5.2 Confidence intervals for population means

5.3 Confidence intervals for population proportions

5.4 Hypothesis testing

5.5 Hypothesis test for a population mean

5.6 Hypothesis test for a population proportion

5.7 Hypothesis test for the difference between two population means

5.8 Hypothesis test for the difference between two population proportions

5.9 One-way analysis of variance (one-way ANOVA)

6. Linear Regression

6.1 Introduction to simple linear regression (SLR)

6.2 SLR assumptions

6.3 Correlation and coefficient of determination

6.4 Interpreting SLR models

6.5 Confidence and prediction intervals for SLR models

6.6 Testing SLR parameters

6.7 Multiple regression

6.8 Categorical predictor variables

6.9 Interaction terms

6.10 Linear regression example

7. Chi-square Tests for Categorical Data

7.1 Categorical data

7.2 Fisher’s exact test

7.3 Introduction to chi-square tests

7.4 Chi-square test for homogeneity and independence

7.5 Relative risk and odds ratios

8. Introduction to Data Mining

8.1 What is data mining?

8.2 Data formats

8.3 Machine learning methods

8.4 sci-kit learn

9. Data Cleansing and Preparation

9.1 What is data cleansing?

9.2 Handling missing values

9.3 Outliers

9.4 Standardization and normalization

9.5 Dimensionality reduction

9.6 Training, validation, and test sets

10. Supervised Learning

10.1 k nearest neighbors

10.2 Logistic regression

10.3 Evaluating classification models

10.4 Supervised learning examples

11. Unsupervised Learning

11.1 Clustering methods

11.2 Association rules

11.3 Evaluating clustering models

11.4 Unsupervised learning examples

12. Decision Tree Learning

12.1 Introduction to decision trees

12.2 Classification and regression trees (CART)

12.3 ID3 and C4.5 algorithms

12.4 Classification tree example

12.5 Regression tree example

12.6 Random forests

13. Ethics

13.1 Misleading statistics

13.2 Abuse of the p-value

13.3 Data privacy

13.4 Ethical guidelines

14. Appendix A: Distribution Tables

14.1 t-distribution table

14.2 z-distribution table

14.3 Chi-squared distribution table

15. Appendix B: CSV Files

15.1 Data sets

16. Appendix C: Additional Material

16.1 Violin plots

16.2 What is a time series?

16.3 Time series patterns and stationarity

16.4 Moving average and exponential smoothing forecasting

16.5 Forecasting using regression

16.6 What is a Monte Carlo simulation?

16.7 Building simulations

16.8 Optimization and forecasting

16.9 What-if analysis

16.10 Advanced simulations

## What You’ll Find In This zyBook:

### More action with less text.

- An exceptionally student-focused introduction to applied statistics.
- Traditionally difficult topics are made easier using animations and learning questions.
- Several chapters on data analytics and data mining algorithms are included.
- Python coding environments are provided throughout to allow students to experiment.
- Auto-graded programming activities are included using a built-in programming environment.
- Commonly combined with “Applied Regression Analysis” with numerous configurations possible.

## The zyBooks Approach

### Less text doesn’t mean less learning.

This zyBook provides a concise introduction to bivariate and multivariate statistics using an applied approach with real-world data. Equations for common statistical quantities are provided, but most concepts are explained using animations rather than rigorous mathematical proof. This content is recommended for STEM majors who may not have a solid foundation on statistics, but want a friendly introduction to data analytics. Python coding environments are provided that allows students to experiment with datasets that are both interesting and relevant to students’ day-to-day lives.

## Senior Contributors

**Joel Berrier**

*Assistant Professor, Dept. of Physics and Astronomy, Univ. of Nebraska, Kearny, Ph.D. Physics and Astronomy, UC Irvine*

**Chris Chan**

*Content lead: Mathematics, zyBooks, M.A. Mathematics, San Francisco State Univ.*

**Scott Nestler**

*Associate Teaching Professor, Mendoza College of Business, Univ. of Notre Dame, Ph.D. Management Science, Univ. of Maryland, College Park*

**Iain Pardoe**

*Mathematics and Statistics Instructor, Thompson Rivers Univ., Pennsylvania State Univ., and Statistics.com, PhD Statistics, Univ. of Minnesota*

**Ron Siu**

*Content developer, zyBooks, M.S. Biomedical Engineering, UCLA; M.S. Developmental Biology, Stanford*

**Rodney X. Sturdivant**

*Professor, Dept. of Mathematics and Physics, Azusa Pacific Univ., Ph.D. Biostatistics, U Mass Amherst*

**Krista Watts**

*Assistant Professor, Director—Center for Data Analysis and Statistics, United States Military Academy, West Point, Ph.D. Biostatistics, Harvard*