handson-ml2
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
scikit-learn: machine learning in Python
Deep Learning for humans
Tensors and Dynamic neural networks in Python with strong GPU acceleration
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
The fastai deep learning library
Quick Overview
Handson-ml2 is a comprehensive repository containing Jupyter notebooks and Python scripts that accompany the O'Reilly book "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow". It provides practical examples and exercises covering various machine learning and deep learning concepts using popular libraries.
Pros
- Extensive coverage of machine learning topics, from basic to advanced
- Well-structured notebooks with clear explanations and code examples
- Regularly updated to keep pace with the latest versions of libraries
- Includes both Jupyter notebooks and Python scripts for flexibility
Cons
- May be overwhelming for absolute beginners in machine learning
- Requires a significant time investment to work through all materials
- Some examples might become outdated as libraries evolve rapidly
- Dependency on multiple libraries can lead to potential compatibility issues
Code Examples
- Loading and preprocessing data using Scikit-Learn:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
- Creating and training a simple neural network with Keras:
from tensorflow import keras
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=keras.optimizers.SGD(learning_rate=1e-3))
history = model.fit(X_train_scaled, y_train, epochs=20, validation_split=0.2)
- Implementing a random forest classifier using Scikit-Learn:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Random Forest Accuracy: {accuracy:.2f}")
Getting Started
-
Clone the repository:
git clone https://github.com/ageron/handson-ml2.git
-
Install dependencies:
cd handson-ml2 pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
-
Open and run the notebooks in the
handson-ml2
directory to start exploring the examples and exercises.
Competitor Comparisons
An Open Source Machine Learning Framework for Everyone
Pros of tensorflow
- Comprehensive, official library for machine learning and deep learning
- Extensive ecosystem with tools, extensions, and community support
- High-performance, scalable for large-scale deployments
Cons of tensorflow
- Steeper learning curve for beginners
- More complex setup and configuration
- Frequent updates may lead to compatibility issues
Code comparison
handson-ml2:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
tensorflow:
import tensorflow as tf
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
Summary
handson-ml2 is a practical, beginner-friendly repository focused on hands-on machine learning examples using various libraries, including TensorFlow. It provides a gentler introduction to machine learning concepts and implementation.
tensorflow is the official repository for the TensorFlow library, offering a powerful and flexible framework for machine learning and deep learning. It's more suitable for advanced users and large-scale projects but requires more expertise to utilize effectively.
Both repositories use similar code structures for creating neural networks, with handson-ml2 often providing more context and explanations around the code examples.
scikit-learn: machine learning in Python
Pros of scikit-learn
- Comprehensive machine learning library with a wide range of algorithms and tools
- Well-established, mature project with extensive documentation and community support
- Designed for production use with efficient implementations and scalability features
Cons of scikit-learn
- Steeper learning curve for beginners due to its extensive functionality
- Less focus on deep learning and neural networks compared to handson-ml2
- May require additional libraries for more advanced machine learning tasks
Code Comparison
handson-ml2:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
tf.keras.layers.Dense(1, activation='sigmoid')
])
scikit-learn:
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(64,), activation='relu', solver='adam')
model.fit(X_train, y_train)
While handson-ml2 focuses on TensorFlow and Keras for neural networks, scikit-learn provides a simpler API for various machine learning algorithms, including neural networks. The handson-ml2 repository is more suited for learning and experimentation, especially with deep learning, while scikit-learn is designed for practical implementation of machine learning models in production environments.
Deep Learning for humans
Pros of Keras
- Comprehensive deep learning library with extensive documentation
- Supports multiple backend engines (TensorFlow, Theano, CNTK)
- Large community and ecosystem of extensions
Cons of Keras
- Less focus on machine learning concepts and theory
- May be overwhelming for beginners due to its extensive API
- Limited coverage of non-neural network algorithms
Code Comparison
Keras:
from keras.models import Sequential
from keras.layers import Dense
model = Sequential([
Dense(64, activation='relu', input_shape=(10,)),
Dense(1, activation='sigmoid')
])
Handson-ml2:
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Key Differences
- Handson-ml2 is a comprehensive machine learning tutorial with practical examples
- Keras is a high-level neural network library focused on deep learning
- Handson-ml2 covers a broader range of ML topics, including data preprocessing and visualization
- Keras provides more advanced deep learning features and model architectures
- Handson-ml2 uses TensorFlow's implementation of Keras, while Keras supports multiple backends
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Comprehensive deep learning framework with extensive functionality
- Large, active community and ecosystem of tools and libraries
- Flexible and dynamic computational graph for easier debugging
Cons of PyTorch
- Steeper learning curve for beginners compared to handson-ml2
- Less focus on practical, hands-on examples and tutorials
- Requires more setup and configuration for basic tasks
Code Comparison
handson-ml2:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
PyTorch:
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.Softmax(dim=1)
)
Summary
handson-ml2 is a practical, beginner-friendly repository focused on machine learning tutorials and examples using various libraries. PyTorch, on the other hand, is a comprehensive deep learning framework offering more advanced features and flexibility. While PyTorch provides a powerful toolset for experienced practitioners, handson-ml2 may be more suitable for those looking to learn machine learning concepts through hands-on examples.
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Pros of ML-For-Beginners
- More comprehensive curriculum structure with 26 lessons covering various ML topics
- Includes hands-on projects and quizzes for practical learning
- Offers content in multiple languages, making it accessible to a wider audience
Cons of ML-For-Beginners
- Less focus on deep learning compared to handson-ml2
- May not cover advanced topics in as much depth as handson-ml2
- Primarily uses Scikit-learn, while handson-ml2 explores more libraries like TensorFlow
Code Comparison
ML-For-Beginners (using Scikit-learn):
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
handson-ml2 (using TensorFlow):
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)
The fastai deep learning library
Pros of fastai
- Provides a high-level API for quick and easy model development
- Includes advanced techniques like mixed precision training and learning rate finder
- Offers a comprehensive ecosystem with integrated libraries and tools
Cons of fastai
- Steeper learning curve for beginners due to its opinionated approach
- Less flexibility for low-level customization compared to handson-ml2
- Primarily focused on PyTorch, limiting options for other frameworks
Code Comparison
fastai:
from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2, seed=42,
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
handson-ml2:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
mlp = MLPRegressor(random_state=42)
mlp.fit(X_train_scaled, y_train)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Machine Learning Notebooks
â The 3rd edition of my book will be released in October 2022. The notebooks are available at ageron/handson-ml3 and contain more up-to-date code.
This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code and solutions to the exercises in the second edition of my O'Reilly book Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow:
Note: If you are looking for the first edition notebooks, check out ageron/handson-ml. For the third edition, check out ageron/handson-ml3.
Quick Start
Want to play with these notebooks online without having to install anything?
Use any of the following services (I recommended Colab or Kaggle, since they offer free GPUs and TPUs).
WARNING: Please be aware that these services provide temporary environments: anything you do will be deleted after a while, so make sure you download any data you care about.
Just want to quickly look at some notebooks, without executing any code?
-
github.com's notebook viewer also works but it's not ideal: it's slower, the math equations are not always displayed correctly, and large notebooks often fail to open.
Want to run this project using a Docker image?
Read the Docker instructions.
Want to install this project on your own machine?
Start by installing Anaconda (or Miniconda), git, and if you have a TensorFlow-compatible GPU, install the GPU driver, as well as the appropriate version of CUDA and cuDNN (see TensorFlow's documentation for more details).
Next, clone this project by opening a terminal and typing the following commands (do not type the first $
signs on each line, they just indicate that these are terminal commands):
$ git clone https://github.com/ageron/handson-ml2.git
$ cd handson-ml2
Next, run the following commands:
$ conda env create -f environment.yml
$ conda activate tf2
$ python -m ipykernel install --user --name=python3
Finally, start Jupyter:
$ jupyter notebook
If you need further instructions, read the detailed installation instructions.
FAQ
Which Python version should I use?
I recommend Python 3.8. If you follow the installation instructions above, that's the version you will get. Most code will work with other versions of Python 3, but some libraries do not support Python 3.9 or 3.10 yet, which is why I recommend Python 3.8.
I'm getting an error when I call load_housing_data()
Make sure you call fetch_housing_data()
before you call load_housing_data()
. If you're getting an HTTP error, make sure you're running the exact same code as in the notebook (copy/paste it if needed). If the problem persists, please check your network configuration.
I'm getting an SSL error on MacOSX
You probably need to install the SSL certificates (see this StackOverflow question). If you downloaded Python from the official website, then run /Applications/Python\ 3.8/Install\ Certificates.command
in a terminal (change 3.8
to whatever version you installed). If you installed Python using MacPorts, run sudo port install curl-ca-bundle
in a terminal.
I've installed this project locally. How do I update it to the latest version?
See INSTALL.md
How do I update my Python libraries to the latest versions, when using Anaconda?
See INSTALL.md
Contributors
I would like to thank everyone who contributed to this project, either by providing useful feedback, filing issues or submitting Pull Requests. Special thanks go to Haesun Park and Ian Beauregard who reviewed every notebook and submitted many PRs, including help on some of the exercise solutions. Thanks as well to Steven Bunkley and Ziembla who created the docker
directory, and to github user SuperYorio who helped on some exercise solutions.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
scikit-learn: machine learning in Python
Deep Learning for humans
Tensors and Dynamic neural networks in Python with strong GPU acceleration
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
The fastai deep learning library
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot