Table of Contents

1.1 Historical overview
1.2 Why data science?
1.3 Careers in data science
1.4 Data science lifecycle
1.5 Ethics in data science
1.6 Case study: Netflix

2.1 Data collection
2.2 Descriptive statistics
2.3 Probability
2.4 Probability distributions
2.5 Inferential statistics
2.6 Inference for proportions and means

3.1 Data wrangling
3.2 Structuring data
3.3 Cleaning data
3.4 Enriching data

4.1 Visualizing data with one feature
4.2 Visualizing data with multiple features
4.3 Best practices for visualizing data
4.4 Tools for visualizing data
4.5 Performing exploratory data analysis
4.6 Detecting outliers

5.1 Introduction to regression
5.2 Simple linear regression
5.3 Linear regression assumptions
5.4 Multiple linear regression
5.5 Logistic regression

6.1 Model error
6.2 Binary classification metrics
6.3 Regression metrics
6.4 Training, validation, and test sets
6.5 Cross-validation
6.6 Bootstrap method
6.7 Comparing models

7.1 Introduction to supervised learning
7.2 K-nearest neighbors
7.3 Naive Bayes classification
7.4 Support vector machines

8.1 Introduction to unsupervised learning
8.2 K-means clustering
8.3 Hierarchical clustering
8.4 Detecting outliers using DBSCAN
8.5 Analyzing factors
8.6 Analyzing factors using PCA

9.1 Introduction to decision trees
9.2 Regression trees
9.3 Classification trees
9.4 Random forests

10.1 Introduction to artificial neural networks
10.2 Single-layer perceptron
10.3 Nonlinear activation functions
10.4 Multilayer perceptron

11.1 Introduction to ensemble models
11.2 Boosting
11.3 Bagging
11.4 Stacking

Teach data science with the only interactive introduction to foundational algorithms and techniques

Data Science Foundations is the first complete, interactive introduction that develops an applied understanding of topics in data science.

  • Covers topics from a conceptual standpoint without assuming prerequisite knowledge in statistics and programming
  • Includes data preprocessing, regression techniques, supervised and unsupervised learning algorithms, decision trees, neural networks, ensemble methods, and model evaluation techniques
  • Teaches the necessary skills to dive further into the more quantitative and technical aspects of data science and machine learning
  • Continuously updated with the latest advances in data science
  • Adopters have access to a test bank with questions for every chapter

Co-author Dr. Schwab-McCoy explains the benefits of zyBooks for both data science instructors and students:

What is a zyBook?

Data Science Foundations is a web-native, interactive zyBook that helps students visualize concepts to learn faster and more effectively than with a traditional textbook. (Check out our research.)

Since 2012, over 1,700 academic institutions have adopted web-native zyBooks to transform their STEM education.

zyBooks benefit students and instructors:

  • Instructor benefits
  • Customize your course by reorganizing existing content or adding your own
  • Continuous publication model updates your course with the latest content and technologies
  • Gain insight into students’ progress, reading and participation with robust reporting
  • Build quizzes and exams with over 250 included test questions
  • Student benefits
  • Learning questions and other content serve as an interactive form of reading
  • Instant feedback on labs and homework
  • Concepts come to life through extensive animations embedded into the interactive content
  • Save chapters as PDFs to reference the material at any time


Chris Chan
Senior Manager, Content Development in Math, Stats, and Data Science / zyBooks / MA in Mathematics / San Francisco State University

Matt Rissler
Data Science Content Developer / PhD in Mathematics / University of Notre Dame

Aimee Schwab-McCoy
Data Science Content Developer / PhD in Statistics / University of Nebraska–Lincoln

Instructors: Interested in evaluating this zyBook for your class?

Check out these related zyBooks