data-science

📊 Path to a free self-taught education in Data Science!

19,943

3,706

19,943

View on GitHub

Top Related Projects

go

25,496

The Open Source Data Science Masters

coding-interview-university

316,385

A complete computer science study plan to become a software engineer.

data-science-ipython-notebooks

28,305

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Data-Science-For-Beginners

29,342

10 Weeks, 20 Lessons, Data Science for All!

awesome-datascience

26,714

:memo: An awesome Data Science repository to learn and apply for real world problems.

courses

4,096

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

Quick Overview

The ossu/data-science repository is a comprehensive, open-source curriculum for studying Data Science. It provides a structured learning path for individuals interested in pursuing a complete education in data science, covering topics from linear algebra and calculus to machine learning and data visualization. The curriculum is designed to be self-paced and freely accessible to anyone with an internet connection.

Pros

Comprehensive curriculum covering all major aspects of data science
Free and open-source, making education accessible to everyone
Regularly updated with community input to stay current with industry trends
Includes resources from reputable institutions and platforms (e.g., MIT, Stanford, Coursera)

Cons

Self-paced nature may be challenging for some learners who prefer structured schedules
Lack of formal certification or accreditation upon completion
Some linked courses may require paid subscriptions or have limited free access
May not provide the same networking opportunities as traditional educational programs

Note: As this is not a code library, the code example and quick start sections have been omitted.

Competitor Comparisons

go

25,496

The Open Source Data Science Masters

Pros of go

More concise and focused curriculum
Emphasizes practical, industry-relevant tools and technologies
Includes specific book and course recommendations

Cons of go

Less structured learning path
May not cover foundational topics as thoroughly
Fewer community-driven updates and contributions

Code Comparison

While neither repository contains actual code samples, they differ in how they present their content:

data-science uses a structured markdown format:

## Core Mathematics
### [Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/)
**Topics covered**:
`vectors and matrices`, `solving linear equations`, `vector spaces`, ...

go uses a simpler, list-based approach:

* Linear Algebra & Programming
* Statistics
* Machine Learning
* Deep Learning
* Natural Language Processing
* Big Data
* Data Visualization

Both repositories serve as curated lists of resources for learning data science, but they differ in their approach and level of detail. data-science offers a more comprehensive, structured curriculum with a focus on academic foundations, while go provides a more concise, industry-oriented list of resources. The choice between the two depends on the learner's background, goals, and preferred learning style.

coding-interview-university

316,385

A complete computer science study plan to become a software engineer.

Pros of coding-interview-university

More focused on computer science fundamentals and algorithms
Comprehensive coverage of data structures and system design
Includes practice problems and mock interviews

Cons of coding-interview-university

Less emphasis on practical data science skills and tools
Narrower scope, primarily targeting software engineering interviews
May not cover statistical analysis and machine learning in depth

Code comparison

While both repositories focus on educational content rather than code, coding-interview-university includes some pseudocode examples for algorithms:

# Example from coding-interview-university
def binary_search(list, item):
    low = 0
    high = len(list) - 1
    while low <= high:
        mid = (low + high) // 2
        guess = list[mid]
        if guess == item:
            return mid
        if guess > item:
            high = mid - 1
        else:
            low = mid + 1
    return None

data-science doesn't typically include code snippets, as it's more of a curriculum outline.

Summary

coding-interview-university is ideal for those preparing for software engineering interviews, with a strong focus on computer science fundamentals. data-science offers a broader curriculum covering various aspects of data science, including statistics, machine learning, and data visualization. The choice between the two depends on your career goals and whether you're targeting software engineering or data science roles.

data-science-ipython-notebooks

28,305

Pros of data-science-ipython-notebooks

Provides hands-on, practical examples in Jupyter notebooks
Covers a wide range of data science topics and libraries
Allows for immediate experimentation and code execution

Cons of data-science-ipython-notebooks

Lacks a structured curriculum or learning path
May not provide in-depth explanations or theoretical foundations
Could become outdated if not regularly maintained

Code Comparison

data-science-ipython-notebooks:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data-science:

No direct code examples available, as this repository focuses on providing a curriculum and learning resources rather than code snippets.

The data-science-ipython-notebooks repository offers practical code examples and hands-on learning experiences, while the data-science repository provides a structured curriculum and learning path for aspiring data scientists. The former is better suited for those who prefer learning by doing, while the latter offers a more comprehensive and organized approach to learning data science concepts and skills.

Data-Science-For-Beginners

29,342

10 Weeks, 20 Lessons, Data Science for All!

Pros of Data-Science-For-Beginners

More structured and guided learning path with a 10-week curriculum
Includes hands-on projects and quizzes for practical application
Designed for beginners with no prior data science knowledge

Cons of Data-Science-For-Beginners

Less comprehensive coverage of advanced topics compared to data-science
Focuses primarily on Microsoft tools and technologies
May not provide as much depth in mathematical foundations

Code Comparison

data-science:

# No specific code examples provided in the repository

Data-Science-For-Beginners:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
plt.scatter(df['x'], df['y'])
plt.show()

The data-science repository doesn't provide specific code examples, as it's more of a curated list of resources. In contrast, Data-Science-For-Beginners includes practical code snippets and exercises throughout its lessons, making it more hands-on for beginners.

Overall, Data-Science-For-Beginners offers a more structured and beginner-friendly approach, while data-science provides a comprehensive curriculum covering a broader range of topics and resources for those looking to dive deeper into the field of data science.

awesome-datascience

26,714

:memo: An awesome Data Science repository to learn and apply for real world problems.

Pros of awesome-datascience

Broader coverage of data science topics and resources
More frequently updated with new content
Includes links to datasets, conferences, and job portals

Cons of awesome-datascience

Less structured learning path for beginners
May be overwhelming due to the sheer volume of resources
Lacks a clear curriculum or progression system

Code comparison

While both repositories primarily consist of curated lists and don't contain much code, awesome-datascience does include some basic markdown formatting:

## Machine Learning
- [Scikit-learn](http://scikit-learn.org/)
- [PyTorch](https://pytorch.org/)
- [TensorFlow](https://www.tensorflow.org/)

data-science, on the other hand, uses a more structured format for its curriculum:

### Introduction to Computer Science

**Topics covered**:
`computation`
`imperative programming`
`basic data structures and algorithms`
`and more`

Both repositories use markdown to organize their content, but data-science employs a more consistent structure throughout its curriculum.

courses

4,096

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

Pros of courses

Focused curriculum aligned with Johns Hopkins University's Data Science Specialization
Structured course materials with clear progression
Includes practical assignments and projects for hands-on learning

Cons of courses

Less comprehensive coverage of foundational computer science topics
May not be as frequently updated as community-driven resources
Limited to R programming language, lacking diversity in tools and technologies

Code Comparison

courses (R-focused):

library(dplyr)
data %>%
  filter(year > 2000) %>%
  group_by(category) %>%
  summarize(mean_value = mean(value))

data-science (Python example):

import pandas as pd

df = pd.read_csv('data.csv')
filtered_df = df[df['year'] > 2000]
result = filtered_df.groupby('category')['value'].mean()

Both repositories offer valuable resources for learning data science, but they cater to different audiences and learning styles. courses provides a structured, university-aligned curriculum focused on R, while data-science offers a broader, community-driven approach covering multiple languages and technologies. The choice between them depends on individual learning preferences and career goals.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Open Source Society University

:bar_chart: Path to a free self-taught education in Data Science!

About
Curricular Guideline
How to use this guide
Community
Prerequisites
Curriculum
How to contribute
Code of conduct
Team

About

This is a path for those of you who want to complete the Data Science undergraduate curriculum on your own time, for free, with courses from the best universities in the World.

In our curriculum, we give preference to MOOC (Massive Open Online Course) style courses because these courses were created with our style of learning in mind.

Curricular Guideline

OSSU Data Science uses the report Curriculum Guidelines for Undergraduate Programs in Data Science as our guide for course recommendation.

How to use this guide

Duration

It is possible to finish within about 2 years if you plan carefully and devote roughly 20 hours/week to your studies. Learners can use this spreadsheet to estimate their end date. Make a copy and input your start date and expected hours per week in the Timeline sheet. As you work through courses you can enter your actual course completion dates in the Curriculum Data sheet and get updated completion estimates.

Warning: While the spreadsheet is a useful tool to estimate the time you need to complete this curriculum, it may not be up-to-date with the curriculum. Use the spreadsheet just to estimate the time you need. Use the the GitHub repo to see what courses to do.

Order of the classes

Some courses can be taken in parallel, while others must be taken sequentially. All of the courses within a topic should be taken in the order listed in the curriculum. The graph below demonstrates how topics should be ordered.

Track your progress

Fork the GitHub repo into your own GitHub account and put â next to the stuff you've completed as you complete it. This can serve as your kanban board and will be faster to implement than any other solution (giving you time to spend on the courses).

Which programming languages should I use?

Python and R are heavily used in Data Science community and our courses teach you both. Remember, the important thing for each course is to internalize the core concepts and to be able to use them with whatever tool (programming language) that you wish.

Content Policy

You must share only files that you are allowed. Do NOT disrespect the code of conduct that you sign in the beginning of your courses.

Community

We have a Discord server! This should be your first stop to talk with other OSSU students. Why don't you introduce yourself right now?

You can also interact through GitHub issues.

Add Open Source Society University to your Linkedin profile!

Warning: There are a few third-party/deprecated/outdated material that you might find when searching for OSSU. We recommend you to ignore them, and only use the OSSU Data Science Github Repo. Some known outdated materials are:

An unmaintained and deprecated trello board

Third-party notion templates

Prerequisites

The Data Science curriculum assumes the student has taken high school math and statistics.

Introduction to Data Science

What is Data Science

Introduction to Computer Science

Students who already know basic programming in any language can skip this first course

Introduction to programming

Introduction to Computer Science and Programming Using Python

Introduction to Computational Thinking and Data Science

Data Structures and Algorithms

The Algorithms courses are taught in Java. If students need to learn Java, they should take this course first

Java Programming

Algorithms I: ArrayLists, LinkedLists, Stacks and Queues

Algorithms II: Binary Trees, Heaps, SkipLists and HashMaps

Algorithms III: AVL and 2-4 Trees, Divide and Conquer Algorithms

Algorithms IV: Pattern Matching, Dijkstraâs, MST, and Dynamic Programming Algorithms

Multivariable Calculus

Statistics & Probability

Introduction to Probability

Intro to Descriptive Statistics

Intro to Inferential Statistics

Statistical Learning with Python by Stanford University on EdX (Textbook, Textbook resources) or Statistical Learning With R by Stanford University on EdX (Textbook, Textbook resources)

Data Science Tools & Methods

Tools for Data Science

Data Science Methodology

Data Science: Wrangling

Machine Learning/Data Mining

Supervised Machine Learning: Regression and Classification

Advanced Learning Algorithms

Unsupervised Learning, Recommenders, Reinforcement Learning

Intro to Machine Learning

Mining Massive Datasets

Process Mining

Final project

Part of learning is doing. The assignments and exams for each course are to prepare you to use your knowledge to solve real-world problems.

After you've completed the curriculum, you should identify a problem that you can solve using the knowledge you've acquired. You can create something entirely new, or you can improve some tool/program that you use and wish were better.

Students who would like more guidance in creating a project may choose to use a series of project oriented courses. A sample of options (many more are available, at this point you should be capable of identifying a series that is interesting and relevant to you) are available on this page.

Congratulations

After completing the requirements of the curriculum above, you will have completed the equivalent of a full bachelor's degree in Data Science. Congratulations!

What is next for you? The possibilities are boundless and overlapping:

Look for a job as a data scientist!
Check out the readings for classic books you can read that will sharpen your skills and expand your knowledge.
Join a local data science meetup (e.g. via meetup.com).
Pay attention to emerging technologies in the world of data science.

keep learning

How to contribute

You can open an issue and give us your suggestions as to how we can improve this guide, or what we can do to improve the learning experience.

You can also fork this project and send a pull request to fix any mistakes that you have found.

If you want to suggest a new resource, send a pull request adding such resource to the extras section. The extras section is a place where all of us will be able to submit interesting additional articles, books, courses and specializations.

Code of Conduct

OSSU's code of conduct.

Team

Curriculum Maintainer: Waciuma Wanjohi
Contributors: contributors

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Competitor Comparisons

Pros of go

Cons of go

Code Comparison

Pros of coding-interview-university

Cons of coding-interview-university

Code comparison

Summary

Pros of data-science-ipython-notebooks

Cons of data-science-ipython-notebooks

Code Comparison

Pros of Data-Science-For-Beginners

Cons of Data-Science-For-Beginners

Code Comparison

Pros of awesome-datascience

Cons of awesome-datascience

Code comparison

Pros of courses

Cons of courses

Code Comparison

Convert designs to code with AI

README

Open Source Society University

Contents

About

Curricular Guideline

How to use this guide

Duration

Order of the classes

Track your progress

Which programming languages should I use?

Content Policy

Community

Prerequisites

Curriculum

Introduction to Data Science

Introduction to Computer Science

Data Structures and Algorithms

Databases

Single Variable Calculus

Linear Algebra

Multivariable Calculus

Statistics & Probability

Data Science Tools & Methods

Machine Learning/Data Mining

Final project

Congratulations

How to contribute

Code of Conduct

Team

Top Related Projects

Convert designs to code with AI