Data Science 101 - is there a "consensus curriculum"?

Our previous posts in the Data Science Curriculum series have covered computing competencies and program outlines for data science majors. In our final post, we’ll explore the introduction to data science course.

Data science is still a young discipline

The earliest data science courses are only about 10 years old, which means that introductory data science courses vary widely from school to school. Unlike first programming or statistics courses, a true consensus curriculum for data science has yet to emerge. However, some topics are widely taught at the introductory level.

In Fall 2019, researchers at Creighton University sent a survey to mathematics, statistics, and computer science faculty asking them to indicate which of 34 topics were covered in their intro data science course. 68 faculty responded and completed the topic ranking.

The most common topics listed were:

Description	Proportion of courses
Exploratory data analysis	82%
Data cleaning and wrangling	75%
Data ethics and responsible data use	63%
Data curation and data quality	53%
Linear and logistic regression	53%
Reproducible research	51%
Data lifecycle and data collection	50%
Research methods	41%
Data architecture, data types, and data formats	40%
Text mining	40%
Customizing data visualizations	40%
Supervised machine learning	38%

Data exploration, data wrangling, and data ethics were the three most common topics in the intro data science course

Basic models like linear and logistic regression, the data lifecycle, and data types were also important. Supervised machine learning algorithms and applications like text mining and custom visualizations rounded out the most common topics.

Some topics were ranked high as covered in the data science curriculum, but not necessarily the introduction course, including:

Linear algebra: matrix manipulation, eigenvalues, singularity (74% covered elsewhere)
Traditional statistical inference: hypothesis tests, confidence intervals (66%)
Relational and non-relational databases (59%)
Experimental design, modeling, and planning (57%)
Simulation-based inference: bootstrapping, randomization tests (53%)
Optimization and numerical algorithms (53%)
Systems engineering and software engineering principles (51%)
Unsupervised machine learning (47%)
Big data technologies: batch and parallel processing (46%)
Supervised machine learning (41%)
Cloud computing (41%)

Data science courses have continued to evolve since 2019, so some topics may be more or less important in your course. The Data Science Foundations zyBook covers all of the essential data science topics and more, allowing you to customize your curriculum. Please visit the How to Teach Data Science – zyBooks Guide for additional resources and best practices.

For more information, check out the original study:Aimee Schwab-McCoy, Catherine M. Baker & Rebecca E. Gasper (2021) Data Science in 2020: Computing, Curricula, and Challenges for the Next 10 Years, Journal of Statistics and Data Science Education, 29:1, S40-S50, DOI: 10.1080/10691898.2020.1851159

Data Science 101 – is there a “consensus curriculum”?

Data science is still a young discipline

Data exploration, data wrangling, and data ethics were the three most common topics in the intro data science course

Dr. Aimee Schwab-McCoy

Why zyBooks?

Catalog

Instructors

Students

Data science is still a young discipline

Data exploration, data wrangling, and data ethics were the three most common topics in the intro data science course

Dr. Aimee Schwab-McCoy

Related Posts

Supercharge Your Data Science Labs: A Complete Guide to Using Jupyter Notebooks in zyLabs

USCOTS Presentation: “Where do students struggle on programming tasks in Data Science 101?”

Teach artificial intelligence with Data Science Foundations

Ready to see zyBooks in action?

Why zyBooks?

Catalog

Instructors

Students