## Table of Contents

1.1 What is data?
1.2 What is statistics?
1.3 Observational studies and experiments
1.4 Surveys and sampling methods

2.1 What is data visualization?
2.2 Python for data visualization
2.3 Data frames
2.4 Bar charts
2.5 Pie charts
2.6 Scatter plots
2.7 Line charts
2.8 Data visualization example

3.1 Measures of center
3.2 Measures of variability
3.3 Box plots
3.4 Histograms
3.5 Violin plots

4.1 Introduction to probability
4.2 Addition rule and complements
4.3 Multiplication rule and independence
4.4 Conditional probability
4.5 Bayes’ Theorem
4.6 Combinations and permutations

5.1 Introduction to random variables
5.2 Properties of discrete probability distributions
5.3 Binomial distribution
5.4 Hypergeometric distribution
5.5 Poisson distribution
5.6 Properties of continuous probability distributions
5.7 Normal distribution
5.8 Student’s t-Distribution
5.9 F-distribution
5.10 Chi-square distribution

6.1 Confidence intervals
6.2 Confidence intervals for population means
6.3 Confidence intervals for population proportions
6.4 Hypothesis testing
6.5 Hypothesis test for a population mean
6.6 Hypothesis test for a population proportion
6.7 Hypothesis test for the difference between two population means
6.8 Hypothesis test for the difference between two population proportions
6.9 One-way analysis of variance (one-way ANOVA)

7.1 Categorical data
7.2 Fisher’s exact test
7.3 Introduction to chi-square tests
7.4 Chi-square test for homogeneity and independence
7.5 Relative risk and odds ratios

8.1 Introduction to simple linear regression (SLR)
8.2 SLR assumptions
8.3 Correlation and coefficient of determination
8.4 Interpreting SLR models
8.5 Confidence and prediction intervals for SLR models
8.6 Testing SLR parameters
8.7 Linear regression example

9.1 Introduction to multiple regression
9.2 Multiple regression assumptions and diagnostics
9.3 Coefficient of multiple determination
9.4 Multicollinearity
9.5 Interpreting multiple regression models
9.6 Confidence and prediction intervals for MLR models
9.7 Testing multiple regression parameters
9.8 Multiple regression example

10.1 Categorical predictor variables
10.2 Interaction terms
10.3 Quadratic models
10.4 Complete second order models
10.5 Comparing nested models: F-test
10.6 Higher order models

11.1 Introduction to logistic regression (LR)
11.2 Estimating LR parameters
11.3 LR models with multiple predictors
11.4 LR assumptions and diagnostics
11.5 Testing LR parameters
11.6 Interpreting LR models
11.7 Comparing nested models: Likelihood ratio tests and AIC
11.8 Classification using LR models

12.1 Logarithmic transformations
12.2 Ladder of powers
12.3 Box-Cox transformation

13.1 Introduction to stepwise regression
13.2 Forward selection
13.3 Backward selection
13.4 Stepwise selection

14.1 Parametric vs. nonparametric statistics
14.2 Resampling: Randomization and bootstrapping
14.3 Wilcoxon rank-sum test
14.4 Kruskal-Wallis test
14.5 Multiple tests

15.1 What is data mining?
15.2 Data formats
15.3 Machine learning methods
15.4 scikit-learn

16.1 What is data cleansing?
16.2 Handling missing values
16.3 Outliers
16.4 Standardization and normalization
16.5 Dimensionality reduction
16.6 Training, validation, and test sets

17.1 k nearest neighbors
17.2 Logistic regression
17.3 Evaluating classification models
17.4 Supervised learning examples

18.1 Clustering methods
18.2 Association rules
18.3 Evaluating clustering models
18.4 Unsupervised learning examples

19.1 Introduction to decision trees
19.2 Classification and regression trees (CART)
19.3 ID3 and C4.5 algorithms
19.4 Classification tree example
19.5 Regression tree example
19.6 Random forests

20.1 Introduction to principal component analysis (PCA)
20.2 Calculating principal components for two variables
20.3 Extending PCA to more variables
20.4 Determining the number of components
20.5 Interpreting principal components

21.1 What is a time series?
21.2 Time series patterns and stationarity
21.3 Moving average and exponential smoothing forecasting
21.4 Forecasting using regression

22.1 What is a Monte Carlo simulation?
22.2 Building simulations
22.3 Optimization and forecasting
22.4 What-if analysis
22.5 Advanced simulations

23.1 Misleading statistics
23.2 Abuse of the p-value
23.3 Data privacy
23.4 Ethical guidelines

14.1 t-distribution table
14.2 z-distribution table
14.3 Chi-squared distribution table

25.1 Data sets

## What You’ll Find In This zyBook:

### More action with less text.

• An exceptionally student-focused introduction to applied statistics.
• Traditionally difficult topics are made easier using animations and learning questions.
• Several chapters on data analytics and data mining algorithms are included.
• Python coding environments are provided throughout to allow students to experiment.
• Auto-graded programming activities are included using a built-in programming environment.

## The zyBooks Approach

### Less text doesn’t mean less learning.

This zyBook provides a concise introduction to bivariate and multivariate statistics using an applied approach with real-world data. Equations for common statistical quantities are provided, but most concepts are explained using animations rather than rigorous mathematical proof. This content is recommended for STEM majors who may not have a solid foundation on statistics, but want a friendly introduction to data analytics. Applied Statistics with Data Analytics gives an overview of elementary statistical concepts, modeling relationships between two or more variables, and advanced topics such as time series and Monte-Carlo methods. Python coding environments are provided that allows students to experiment with datasets that are both interesting and relevant to students’ day-to-day lives.

“The most striking aspect of ZyBooks for me as an instructor has been the ability to introduce a topic and then point my students to specific exercises/activities in ZyBooks that would not only expound on the concept but allow them to practice them with confidence.”

## Senior Contributors

Heather Berrier
Content Developer, Mathematics / zyBooks / PhD Physics and Astronomy, UC Irvine

Joel Berrier
Assistant Professor / Dept. of Physics and Astronomy / Univ. of Nebraska, Kearney / PhD Physics and Astronomy, UC Irvine

Chris Chan
Sr. Manager, Content Development, Mathematics, Statistics, and Data Science / zyBooks / MA Mathematics, San Francisco State Univ.

Scott Nestler
Associate Teaching Professor / Mendoza College of Business / Univ. of Notre Dame / PhD Management Science, Univ. of Maryland, College Park

Iain Pardoe
Mathematics and Statistics Instructor / Thompson Rivers Univ., Pennsylvania State Univ., and Statistics.com / PhD Statistics, Univ. of Minnesota

Rodney X. Sturdivant
Professor, Dept. of Mathematics and Physics / Azusa Pacific Univ. / PhD Biostatistics, U Mass Amherst

Krista Watts
Assistant Professor, Director—Center for Data Analysis and Statistics / United States Military Academy, West Point / PhD Biostatistics, Harvard