courses

Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1

4,096

31,287

4,096

View on GitHub

Top Related Projects

datasharing

6,615

The Leek group guide to data sharing

pydata-book

23,068

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

PythonDataScienceHandbook

44,377

Python Data Science Handbook: full text in Jupyter Notebooks

handson-ml

25,359

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

27,337

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

TensorFlow-Examples

43,663

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Quick Overview

The DataScienceSpecialization/courses repository is a comprehensive collection of course materials for the Data Science Specialization offered by Johns Hopkins University on Coursera. It contains lecture slides, assignments, and supplementary resources for nine courses covering various aspects of data science, from R programming to machine learning and data products.

Pros

Comprehensive coverage of data science topics
High-quality, university-level content
Free and open-source materials
Regular updates and maintenance

Cons

May be overwhelming for beginners
Some content might become outdated over time
Requires self-discipline and motivation for self-paced learning
Limited interaction compared to paid Coursera courses

Code Examples

As this is not a code library but a collection of course materials, there are no specific code examples to showcase. However, the repository contains numerous R scripts and markdown files with code snippets related to data science topics.

Getting Started

Since this is not a code library, there's no specific installation or setup process. To get started with the course materials:

Visit the repository: https://github.com/DataScienceSpecialization/courses
Browse the course folders to find specific topics of interest
Download or clone the repository to access materials locally
Follow the course structure and complete assignments as desired

Note: For the full interactive experience, consider enrolling in the Coursera specialization.

Competitor Comparisons

datasharing

6,615

The Leek group guide to data sharing

Pros of datasharing

Focused specifically on data sharing best practices
Concise and easy to navigate single README file
Provides practical guidelines for researchers and data scientists

Cons of datasharing

Limited in scope compared to the broader data science curriculum
Lacks interactive elements or exercises for hands-on learning
May not cover more advanced topics in data science

Code comparison

datasharing:

## The data are available (paper is not behind a paywall)
## The data are available to download
## The data are available in a useful format

courses:

library(swirl)
swirl()

Summary

The datasharing repository offers a focused guide on data sharing practices, while courses provides a comprehensive data science curriculum. datasharing is more accessible for quick reference but lacks the depth and interactivity of courses. The courses repository includes hands-on exercises using tools like swirl, making it more suitable for in-depth learning. However, datasharing's concise format makes it ideal for researchers looking for quick guidelines on sharing their data effectively.

pydata-book

23,068

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

Pros of pydata-book

Focused on Python for data analysis, providing in-depth coverage of pandas, NumPy, and other key libraries
Includes practical examples and datasets for hands-on learning
Regularly updated to reflect the latest developments in Python data science tools

Cons of pydata-book

Limited to Python ecosystem, not covering other data science languages or tools
Less comprehensive in terms of overall data science curriculum compared to courses
May be more challenging for absolute beginners due to its technical depth

Code Comparison

pydata-book:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
print(df.describe())

courses:

library(dplyr)
library(ggplot2)

data %>%
  group_by(category) %>%
  summarize(mean_value = mean(value)) %>%
  ggplot(aes(x = category, y = mean_value)) + geom_bar(stat = "identity")

The pydata-book example demonstrates basic pandas and NumPy usage, while the courses example showcases data manipulation and visualization in R using dplyr and ggplot2.

PythonDataScienceHandbook

44,377

Python Data Science Handbook: full text in Jupyter Notebooks

Pros of PythonDataScienceHandbook

Comprehensive coverage of Python-specific data science tools and libraries
In-depth explanations with interactive Jupyter notebooks
More recent and up-to-date content

Cons of PythonDataScienceHandbook

Focused solely on Python, lacking coverage of other languages or tools
Less structured as a course, more of a reference guide
May be overwhelming for complete beginners

Code Comparison

PythonDataScienceHandbook:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('data.csv')
plt.scatter(data['x'], data['y'])
plt.show()

courses:

library(ggplot2)
data <- read.csv("data.csv")
ggplot(data, aes(x=x, y=y)) +
  geom_point()

The PythonDataScienceHandbook example uses Python libraries like NumPy, Pandas, and Matplotlib, while the courses example uses R with ggplot2. Both accomplish similar data visualization tasks but with different syntax and libraries specific to their respective languages.

PythonDataScienceHandbook offers a deep dive into Python-specific tools, making it ideal for those focusing on Python for data science. courses provides a broader overview of data science concepts across multiple languages and platforms, which may be more suitable for beginners or those seeking a comprehensive foundation in data science principles.

handson-ml

25,359

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Pros of handson-ml

More focused on practical machine learning implementation
Includes Jupyter notebooks with interactive code examples
Covers a wider range of modern ML techniques and frameworks

Cons of handson-ml

Less comprehensive coverage of general data science topics
May be more challenging for absolute beginners
Requires more setup and dependencies to run examples

Code Comparison

handson-ml:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

courses:

library(caret)

model <- train(y ~ ., data = training_data,
               method = "rf",
               trControl = trainControl(method = "cv", number = 5))

The handson-ml example demonstrates a neural network model using TensorFlow, while the courses example shows a random forest model using R's caret package. This highlights the different focus areas and technologies covered by each repository.

handson-ml is more suited for those interested in deep learning and modern ML frameworks, while courses provides a broader introduction to data science concepts and traditional statistical methods.

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

27,337

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Pros of Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Focused on a specific, advanced topic in data science
Provides hands-on examples using PyMC3 and TensorFlow Probability
Offers a more in-depth exploration of Bayesian methods

Cons of Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Narrower scope compared to the broader data science curriculum
May be more challenging for beginners without prior statistics knowledge
Less comprehensive coverage of general data science topics

Code Comparison

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers:

import pymc3 as pm
with pm.Model() as model:
    theta = pm.Beta('theta', alpha=1, beta=1)
    y = pm.Bernoulli('y', p=theta, observed=[1,1,1,0,1,1,0])
    trace = pm.sample(1000, tune=1000)

courses:

library(dplyr)
data %>%
  group_by(category) %>%
  summarize(mean_value = mean(value, na.rm = TRUE))

The code snippets highlight the different focus areas:

Probabilistic-Programming-and-Bayesian-Methods-for-Hackers uses PyMC3 for Bayesian modeling
courses demonstrates data manipulation using R and dplyr

Both repositories offer valuable resources for data science learners, with Probabilistic-Programming-and-Bayesian-Methods-for-Hackers providing a deep dive into Bayesian methods, while courses offers a broader curriculum covering various data science topics.

TensorFlow-Examples

43,663

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Pros of TensorFlow-Examples

Focused specifically on TensorFlow, providing in-depth examples
More up-to-date with recent machine learning techniques and TensorFlow versions
Includes practical examples for various neural network architectures

Cons of TensorFlow-Examples

Narrower scope, covering only TensorFlow and not broader data science topics
Less comprehensive course structure compared to courses
May be more challenging for beginners without prior machine learning knowledge

Code Comparison

TensorFlow-Examples:

import tensorflow as tf

# Create a Constant op
hello = tf.constant('Hello, TensorFlow!')

# Start a TF session
with tf.Session() as sess:
    print(sess.run(hello))

courses:

library(dplyr)

# Load and process data
data <- read.csv("data.csv")
processed_data <- data %>%
  filter(!is.na(value)) %>%
  group_by(category) %>%
  summarize(mean_value = mean(value))

The TensorFlow-Examples code demonstrates basic TensorFlow usage, while the courses code shows data manipulation in R, reflecting the different focus of each repository.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Data Science Specialization

These are the course materials for the Johns Hopkins Data Science Specialization on Coursera

https://www.coursera.org/specialization/jhudatascience/1

Materials are under development and subject to change.

Contributors

Brian Caffo
Jeff Leek
Roger Peng
Nick Carchedi
Sean Kross

License

These course materials are available under the Creative Commons Attribution NonCommercial ShareAlike (CC-NC-SA) license (http://www.tldrlegal.com/l/CC-NC-SA).

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot