introduction_to_ml_with_python
Notebooks and code for the book "Introduction to Machine Learning with Python"
Top Related Projects
scikit-learn: machine learning in Python
The "Python Machine Learning (1st edition)" book code repository and info resource
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Python Data Science Handbook: full text in Jupyter Notebooks
The fastai deep learning library
An Open Source Machine Learning Framework for Everyone
Quick Overview
"Introduction to Machine Learning with Python" is a GitHub repository accompanying the book of the same name by Andreas Müller and Sarah Guido. It contains Jupyter notebooks and Python scripts that demonstrate various machine learning concepts and techniques using popular libraries like scikit-learn, NumPy, and pandas.
Pros
- Comprehensive coverage of machine learning topics, from basic concepts to advanced techniques
- Practical examples and real-world datasets for hands-on learning
- Well-structured code and explanations that align with the book's content
- Regular updates to keep pace with evolving libraries and best practices
Cons
- Some examples may become outdated as libraries evolve
- Requires prior knowledge of Python programming
- May not cover the latest cutting-edge machine learning techniques
- Limited focus on deep learning compared to traditional machine learning
Code Examples
- Loading and exploring a dataset:
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
print(df.head())
- Training a simple classifier:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
print(f"Test set score: {knn.score(X_test, y_test):.2f}")
- Visualizing decision boundaries:
import numpy as np
import matplotlib.pyplot as plt
x0, x1 = iris.data[:, 0], iris.data[:, 1]
y = iris.target
x0_min, x0_max = x0.min() - 1, x0.max() + 1
x1_min, x1_max = x1.min() - 1, x1.max() + 1
xx, yy = np.meshgrid(np.arange(x0_min, x0_max, .02),
np.arange(x1_min, x1_max, .02))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdYlBu)
plt.scatter(x0, x1, c=y, cmap=plt.cm.RdYlBu)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()
Getting Started
-
Clone the repository:
git clone https://github.com/amueller/introduction_to_ml_with_python.git
-
Install required packages:
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
-
Open and run the notebooks in the
notebooks
directory to explore machine learning concepts and examples.
Competitor Comparisons
scikit-learn: machine learning in Python
Pros of scikit-learn
- Comprehensive library with a wide range of machine learning algorithms and tools
- Highly optimized and efficient implementations for production use
- Extensive documentation and community support
Cons of scikit-learn
- Steeper learning curve for beginners
- Less focus on educational content and explanations
- More complex API for some advanced use cases
Code Comparison
introduction_to_ml_with_python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
scikit-learn:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
The code examples show that both repositories use similar syntax and structure for basic machine learning tasks. However, scikit-learn offers more advanced options and parameters for fine-tuning models and data processing.
The "Python Machine Learning (1st edition)" book code repository and info resource
Pros of python-machine-learning-book
- More comprehensive coverage of advanced ML topics
- Includes deep learning and neural network concepts
- Regular updates and new editions to keep content current
Cons of python-machine-learning-book
- May be more challenging for absolute beginners
- Less focus on practical, hands-on examples
- Requires more background knowledge in mathematics and statistics
Code Comparison
introduction_to_ml_with_python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
python-machine-learning-book:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
Both repositories provide excellent resources for learning machine learning with Python. introduction_to_ml_with_python focuses on practical, beginner-friendly examples using scikit-learn, while python-machine-learning-book offers a more in-depth exploration of ML concepts and implementations.
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Pros of handson-ml
- More comprehensive coverage of advanced ML topics
- Includes deep learning and neural networks
- Regularly updated with newer ML techniques and libraries
Cons of handson-ml
- May be overwhelming for absolute beginners
- Requires more prior knowledge of Python and data science concepts
- Less focus on foundational ML concepts
Code Comparison
introduction_to_ml_with_python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
handson-ml:
import tensorflow as tf
from tensorflow import keras
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu", input_shape=[8]),
keras.layers.Dense(30, activation="relu"),
keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=keras.optimizers.SGD(learning_rate=1e-3))
The code comparison shows that introduction_to_ml_with_python focuses on traditional ML algorithms using scikit-learn, while handson-ml includes more advanced topics like deep learning with TensorFlow and Keras.
Python Data Science Handbook: full text in Jupyter Notebooks
Pros of PythonDataScienceHandbook
- Covers a broader range of data science topics, including data manipulation, visualization, and statistics
- Provides in-depth explanations and examples for each topic
- Includes interactive Jupyter notebooks for hands-on learning
Cons of PythonDataScienceHandbook
- Less focused on machine learning algorithms and techniques
- May be overwhelming for beginners due to its comprehensive nature
- Requires more time investment to work through all the material
Code Comparison
PythonDataScienceHandbook:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.show()
introduction_to_ml_with_python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
The code examples highlight the different focus areas of each repository. PythonDataScienceHandbook emphasizes data visualization and exploration, while introduction_to_ml_with_python concentrates on machine learning algorithms and implementation using scikit-learn.
The fastai deep learning library
Pros of fastai
- More comprehensive and advanced deep learning library
- Offers high-level APIs for quick model development
- Includes cutting-edge techniques and best practices
Cons of fastai
- Steeper learning curve for beginners
- Focused primarily on deep learning, less coverage of traditional ML algorithms
- Requires more computational resources for some tasks
Code Comparison
introduction_to_ml_with_python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
fastai:
from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2, seed=42,
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
The code examples highlight the difference in focus and complexity between the two libraries. introduction_to_ml_with_python uses scikit-learn for traditional ML tasks, while fastai provides a high-level API for deep learning, particularly in computer vision tasks.
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Comprehensive deep learning framework with extensive capabilities
- Large ecosystem and community support
- Highly scalable for production environments
Cons of TensorFlow
- Steeper learning curve for beginners
- More complex setup and configuration
- Potentially overwhelming for simple ML tasks
Code Comparison
Introduction to ML with Python:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
TensorFlow:
import tensorflow as tf
from tensorflow.keras import layers
model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(4,)),
layers.Dense(64, activation='relu'),
layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Introduction to ML with Python focuses on scikit-learn for basic ML concepts, while TensorFlow provides a more advanced framework for deep learning and complex models. The former is more suitable for beginners and simple tasks, while the latter offers greater flexibility and power for advanced applications.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Introduction to Machine Learning with Python
This repository holds the code for the forthcoming book "Introduction to Machine Learning with Python" by Andreas Mueller and Sarah Guido. You can find details about the book on the O'Reilly website.
The book requires the current stable version of scikit-learn, that is
0.20.0. Most of the book can also be used with previous versions of
scikit-learn, though you need to adjust the import for everything from the
model_selection
module, mostly cross_val_score
, train_test_split
and GridSearchCV
.
This repository provides the notebooks from which the book is created, together
with the mglearn
library of helper functions to create figures and
datasets.
For the curious ones, the cover depicts a hellbender.
All datasets are included in the repository, with the exception of the aclImdb dataset, which you can download from the page of Andrew Maas. See the book for details.
If you get ImportError: No module named mglearn
you can try to install mglearn into your python environment using
the command pip install mglearn
in your terminal or !pip install mglearn
in Jupyter Notebook.
Errata
Please note that the first print of the book is missing the following line when listing the assumed imports:
from IPython.display import display
Please add this line if you see an error involving display
.
The first print of the book used a function called plot_group_kfold
.
This has been renamed to plot_label_kfold
because of a rename in
scikit-learn.
Setup
To run the code, you need the packages numpy
, scipy
, scikit-learn
, matplotlib
, pandas
and pillow
.
Some of the visualizations of decision trees and neural networks structures also require graphviz
. The chapter
on text processing also requires nltk
and spacy
.
The easiest way to set up an environment is by installing Anaconda.
Installing packages with conda:
If you already have a Python environment set up, and you are using the conda
package manager, you can get all packages by running
conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz python-graphviz
For the chapter on text processing you also need to install nltk
and spacy
:
conda install nltk spacy
Installing packages with pip
If you already have a Python environment and are using pip to install packages, you need to run
pip install numpy scipy scikit-learn matplotlib pandas pillow graphviz
You also need to install the graphiz C-library, which is easiest using a package manager.
If you are using OS X and homebrew, you can brew install graphviz
. If you are on Ubuntu or debian, you can apt-get install graphviz
.
Installing graphviz on Windows can be tricky and using conda / anaconda is recommended.
For the chapter on text processing you also need to install nltk
and spacy
:
pip install nltk spacy
Downloading English language model
For the text processing chapter, you need to download the English language model for spacy using
python -m spacy download en
Submitting Errata
If you have errata for the (e-)book, please submit them via the O'Reilly Website. You can submit fixes to the code as pull-requests here, but I'd appreciate it if you would also submit them there, as this repository doesn't hold the "master notebooks".
Top Related Projects
scikit-learn: machine learning in Python
The "Python Machine Learning (1st edition)" book code repository and info resource
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Python Data Science Handbook: full text in Jupyter Notebooks
The fastai deep learning library
An Open Source Machine Learning Framework for Everyone
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot