Creating a Data Science Major

Avatar photo Dr. Aimee Schwab-McCoy

The previous post in our Data Science Curriculum series covered a set of knowledge areas for every data science graduate. However, data science programs may cover knowledge areas in a variety of ways.

As the landscape of technology continues to evolve at an unprecedented pace, universities must adapt their curriculum to meet the demands of the modern world. One significant area that has gained immense prominence is data science – the art of extracting insights and knowledge from vast amounts of data. By incorporating a Data Science major into your university’s academic repertoire, you have the opportunity to equip students with the skills and knowledge needed to excel in today’s data-driven world. This transformative move not only empowers students with cutting-edge expertise but also positions your institution as a leader in preparing the next generation of data scientists.

Caption: Flow chart of one possible Data Science major. (Park City Math Institute, 2017)

The Park City Math Institute Summer Undergraduate Faculty Program brought together 25 faculty from mathematics, statistics, and computer science in 2017 to build a set of curriculum guidelines for undergraduate data science programs. The PCMI faculty group suggested the following ten course sequence for a data science major.

  1. Introduction to Data Science I and II: A two-term sequence covering exploring and manipulating data, basic coding, introduction to modeling, databases, introduction to data collection and statistical inference.
  2. Mathematics for Data Science I and II: A two-term sequence covering linear modeling and matrix computation, optimization, multivariate thinking, probabilistic thinking and modeling.
  3. Algorithms and Software Foundations: Algorithm design, programming concepts and data structures, tools and environments, scaling for big data.
  4. Data Curation: Databases and Data Management: Relational databases, streaming data, web scraping.
  5. Introduction to Statistics: Exploratory data analysis, estimation and testing, simulation and resampling, linear models, model selection and performance.
  6. Machine Learning: Classification and regression, algorithms, performance metrics and validation, data transformations, supervised and unsupervised learning, and ensemble methods.

The two final course suggestions are a course in an outside discipline, like Business Analytics or Bioinformatics, and a capstone course. The data science capstone should cover professional skills and applications of ethics in data science.

Of course, creating ten brand-new courses is not an easy task, and could take several years to implement. Luckily, existing courses in computer science, mathematics, and statistics departments can be used to fit the suggested curriculum. Even more luckily, all courses have a zyBook!

The zyBooks Catalog includes:

By building a Data Science major, professors can design a curriculum that integrates theoretical foundations with hands-on project work, ensuring students are equipped with both the knowledge and practical skills necessary to thrive in the field. From foundational programming and statistical concepts to advanced machine learning algorithms and data visualization techniques, the curriculum can be tailored to provide a well-rounded education that prepares students for real-world data challenges.

The next and final entry in this blog series will look at the intro data science course in more detail, and explore the “must-have” topics for a first course in data science.

Avatar photo
Author Bio

Dr. Aimee Schwab-McCoy

Before joining zyBooks, Aimee was a statistics professor at Creighton University, where she created a Data Science program. Aimee is an experienced statistics and data science education researcher and passionate about developing engaging resources for data science learners.