## Table of Contents

1. Introduction to Data Science

1.1 Historical overview

1.2 Why data science?

1.3 Careers in data science

1.4 Data science lifecycle

1.5 Ethics in data science

1.6 Case study: Netflix

2. Python for Data Science

2.1 Programming with Python and Jupyter

2.2 Python data types

2.3 Python functions

2.4 Data science packages

2.5 Numpy package

2.6 pandas package

2.7 matplotlib package

3. Probability and Statistics

3.1 Data collection

3.2 Descriptive statistics

3.3 Probability

3.4 Probability distributions

3.5 Inferential statistics

3.6 Inference for proportions and means

4. SQL for Data Science

4.1 Relational databases

4.2 Simple queries

4.3 Special operators and clauses

4.4 Aggregate functions

4.5 Join queries

4.6 Subqueries

4.7 Queries in Python

5. Data Wrangling

5.1 Data wrangling

5.2 Structuring data

5.3 Cleaning data

5.4 Enriching data

6. Data Exploration

6.1 Visualizing data with one feature

6.2 Visualizing data with multiple features

6.3 Best practices for visualizing data

6.4 Tools for visualizing data

6.5 Performing exploratory data analysis

6.6 Detecting outliers

6.7 Case study: Penguins

7. Regression

7.1 Introduction to regression

7.2 Simple linear regression

7.3 Linear regression assumptions

7.4 Multiple linear regression

7.5 Logistic regression

8. Evaluating Model Performance

8.1 Model error

8.2 Binary classification metrics

8.3 Regression metrics

8.4 Training, validation, and test sets

8.5 Cross-validation

8.6 Bootstrap method

8.7 Comparing models

9. Supervised Learning

9.1 Introduction to supervised learning

9.2 K-nearest neighbors

9.3 Naive Bayes classification

9.4 Support vector machines

10. Unsupervised Learning

10.1 Introduction to unsupervised learning

10.2 K-means clustering

10.3 Hierarchical clustering

10.4 Detecting outliers using DBSCAN

10.5 Analyzing factors

10.6 Analyzing factors using PCA

11. Decision Trees

11.1 Introduction to decision trees

11.2 Regression trees

11.3 Classification trees

11.4 Random forests

12: Artificial Neural Networks

12.1 Introduction to artificial neural networks

12.2 Single-layer perceptron

12.3 Nonlinear activation functions

12.4 Multilayer perceptron

13. Ensemble Techniques

13.1 Introduction to ensemble models

13.2 Boosting

13.3 Bagging

13.4 Stacking

14. Appendix

14.1 Datasets: CSV files

## What You’ll Find In This zyBook:

### More action with less text.

- Builds student understanding and confidence through learning questions and coding activities
- Students learn the necessary skills required for the more quantitative and technical aspects of data science and machine learning
- Each section covers topics from a conceptual standpoint without assuming prerequisite knowledge in statistics and programming
- Jupyter Notebooks integration allows students to write and edit live code, create data visualizations, and experiment by changing the parameters of different models
- Test bank with more than 330 questions

## The zyBooks Approach

### Less text doesn’t mean less learning.

The Data Science Foundations with Python zyBook provides an interactive introduction to common algorithms and techniques in data science. This zyBook covers data preprocessing, regression techniques, supervised and unsupervised learning algorithms, decision trees, neural networks, ensemble methods, and model evaluation techniques.

## Authors

**Chris Chan**

*Senior Manager, Content Development in Math, Stats, and Data Science / zyBooks / M.A. in Mathematics / San Francisco State University*

**Matt Rissler**

*Data Science Content Developer / Ph.D. in Mathematics / University of Notre Dame*

**Aimee Schwab-McCoy**

*Data Science Content Developer / Ph.D. in Statistics / University of Nebraska–Lincoln*