courses
Course materials for the Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1
Top Related Projects
The Leek group guide to data sharing
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
Python Data Science Handbook: full text in Jupyter Notebooks
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Quick Overview
The DataScienceSpecialization/courses repository is a comprehensive collection of course materials for the Data Science Specialization offered by Johns Hopkins University on Coursera. It contains lecture slides, assignments, and supplementary resources for nine courses covering various aspects of data science, from R programming to machine learning and data products.
Pros
- Comprehensive coverage of data science topics
- High-quality, university-level content
- Free and open-source materials
- Regular updates and maintenance
Cons
- May be overwhelming for beginners
- Some content might become outdated over time
- Requires self-discipline and motivation for self-paced learning
- Limited interaction compared to paid Coursera courses
Code Examples
As this is not a code library but a collection of course materials, there are no specific code examples to showcase. However, the repository contains numerous R scripts and markdown files with code snippets related to data science topics.
Getting Started
Since this is not a code library, there's no specific installation or setup process. To get started with the course materials:
- Visit the repository: https://github.com/DataScienceSpecialization/courses
- Browse the course folders to find specific topics of interest
- Download or clone the repository to access materials locally
- Follow the course structure and complete assignments as desired
Note: For the full interactive experience, consider enrolling in the Coursera specialization.
Competitor Comparisons
The Leek group guide to data sharing
Pros of datasharing
- Focused specifically on data sharing best practices
- Concise and easy to navigate single README file
- Provides practical guidelines for researchers and data scientists
Cons of datasharing
- Limited in scope compared to the broader data science curriculum
- Lacks interactive elements or exercises for hands-on learning
- May not cover more advanced topics in data science
Code comparison
datasharing:
## The data are available (paper is not behind a paywall)
## The data are available to download
## The data are available in a useful format
courses:
library(swirl)
swirl()
Summary
The datasharing repository offers a focused guide on data sharing practices, while courses provides a comprehensive data science curriculum. datasharing is more accessible for quick reference but lacks the depth and interactivity of courses. The courses repository includes hands-on exercises using tools like swirl, making it more suitable for in-depth learning. However, datasharing's concise format makes it ideal for researchers looking for quick guidelines on sharing their data effectively.
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
Pros of pydata-book
- Focused on Python for data analysis, providing in-depth coverage of pandas, NumPy, and other key libraries
- Includes practical examples and datasets for hands-on learning
- Regularly updated to reflect the latest developments in Python data science tools
Cons of pydata-book
- Limited to Python ecosystem, not covering other data science languages or tools
- Less comprehensive in terms of overall data science curriculum compared to courses
- May be more challenging for absolute beginners due to its technical depth
Code Comparison
pydata-book:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
print(df.describe())
courses:
library(dplyr)
library(ggplot2)
data %>%
group_by(category) %>%
summarize(mean_value = mean(value)) %>%
ggplot(aes(x = category, y = mean_value)) + geom_bar(stat = "identity")
The pydata-book example demonstrates basic pandas and NumPy usage, while the courses example showcases data manipulation and visualization in R using dplyr and ggplot2.
Python Data Science Handbook: full text in Jupyter Notebooks
Pros of PythonDataScienceHandbook
- Comprehensive coverage of Python-specific data science tools and libraries
- In-depth explanations with interactive Jupyter notebooks
- More recent and up-to-date content
Cons of PythonDataScienceHandbook
- Focused solely on Python, lacking coverage of other languages or tools
- Less structured as a course, more of a reference guide
- May be overwhelming for complete beginners
Code Comparison
PythonDataScienceHandbook:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data.csv')
plt.scatter(data['x'], data['y'])
plt.show()
courses:
library(ggplot2)
data <- read.csv("data.csv")
ggplot(data, aes(x=x, y=y)) +
geom_point()
The PythonDataScienceHandbook example uses Python libraries like NumPy, Pandas, and Matplotlib, while the courses example uses R with ggplot2. Both accomplish similar data visualization tasks but with different syntax and libraries specific to their respective languages.
PythonDataScienceHandbook offers a deep dive into Python-specific tools, making it ideal for those focusing on Python for data science. courses provides a broader overview of data science concepts across multiple languages and platforms, which may be more suitable for beginners or those seeking a comprehensive foundation in data science principles.
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Pros of handson-ml
- More focused on practical machine learning implementation
- Includes Jupyter notebooks with interactive code examples
- Covers a wider range of modern ML techniques and frameworks
Cons of handson-ml
- Less comprehensive coverage of general data science topics
- May be more challenging for absolute beginners
- Requires more setup and dependencies to run examples
Code Comparison
handson-ml:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
courses:
library(caret)
model <- train(y ~ ., data = training_data,
method = "rf",
trControl = trainControl(method = "cv", number = 5))
The handson-ml example demonstrates a neural network model using TensorFlow, while the courses example shows a random forest model using R's caret package. This highlights the different focus areas and technologies covered by each repository.
handson-ml is more suited for those interested in deep learning and modern ML frameworks, while courses provides a broader introduction to data science concepts and traditional statistical methods.
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
Pros of Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
- Focused on a specific, advanced topic in data science
- Provides hands-on examples using PyMC3 and TensorFlow Probability
- Offers a more in-depth exploration of Bayesian methods
Cons of Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
- Narrower scope compared to the broader data science curriculum
- May be more challenging for beginners without prior statistics knowledge
- Less comprehensive coverage of general data science topics
Code Comparison
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers:
import pymc3 as pm
with pm.Model() as model:
theta = pm.Beta('theta', alpha=1, beta=1)
y = pm.Bernoulli('y', p=theta, observed=[1,1,1,0,1,1,0])
trace = pm.sample(1000, tune=1000)
courses:
library(dplyr)
data %>%
group_by(category) %>%
summarize(mean_value = mean(value, na.rm = TRUE))
The code snippets highlight the different focus areas:
- Probabilistic-Programming-and-Bayesian-Methods-for-Hackers uses PyMC3 for Bayesian modeling
- courses demonstrates data manipulation using R and dplyr
Both repositories offer valuable resources for data science learners, with Probabilistic-Programming-and-Bayesian-Methods-for-Hackers providing a deep dive into Bayesian methods, while courses offers a broader curriculum covering various data science topics.
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Pros of TensorFlow-Examples
- Focused specifically on TensorFlow, providing in-depth examples
- More up-to-date with recent machine learning techniques and TensorFlow versions
- Includes practical examples for various neural network architectures
Cons of TensorFlow-Examples
- Narrower scope, covering only TensorFlow and not broader data science topics
- Less comprehensive course structure compared to courses
- May be more challenging for beginners without prior machine learning knowledge
Code Comparison
TensorFlow-Examples:
import tensorflow as tf
# Create a Constant op
hello = tf.constant('Hello, TensorFlow!')
# Start a TF session
with tf.Session() as sess:
print(sess.run(hello))
courses:
library(dplyr)
# Load and process data
data <- read.csv("data.csv")
processed_data <- data %>%
filter(!is.na(value)) %>%
group_by(category) %>%
summarize(mean_value = mean(value))
The TensorFlow-Examples code demonstrates basic TensorFlow usage, while the courses code shows data manipulation in R, reflecting the different focus of each repository.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Data Science Specialization
These are the course materials for the Johns Hopkins Data Science Specialization on Coursera
https://www.coursera.org/specialization/jhudatascience/1
Materials are under development and subject to change.
Contributors
- Brian Caffo
- Jeff Leek
- Roger Peng
- Nick Carchedi
- Sean Kross
License
These course materials are available under the Creative Commons Attribution NonCommercial ShareAlike (CC-NC-SA) license (http://www.tldrlegal.com/l/CC-NC-SA).
Top Related Projects
The Leek group guide to data sharing
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
Python Data Science Handbook: full text in Jupyter Notebooks
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot