introduction_to_ml_with_python

Notebooks and code for the book "Introduction to Machine Learning with Python"

7,802

4,647

7,802

View on GitHub

Top Related Projects

scikit-learn

62,466

scikit-learn: machine learning in Python

python-machine-learning-book

12,475

The "Python Machine Learning (1st edition)" book code repository and info resource

handson-ml

25,538

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

PythonDataScienceHandbook

45,111

Python Data Science Handbook: full text in Jupyter Notebooks

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Quick Overview

"Introduction to Machine Learning with Python" is a GitHub repository accompanying the book of the same name by Andreas Müller and Sarah Guido. It contains Jupyter notebooks and Python scripts that demonstrate various machine learning concepts and techniques using popular libraries like scikit-learn, NumPy, and pandas.

Pros

Comprehensive coverage of machine learning topics, from basic concepts to advanced techniques
Practical examples and real-world datasets for hands-on learning
Well-structured code and explanations that align with the book's content
Regular updates to keep pace with evolving libraries and best practices

Cons

Some examples may become outdated as libraries evolve
Requires prior knowledge of Python programming
May not cover the latest cutting-edge machine learning techniques
Limited focus on deep learning compared to traditional machine learning

Code Examples

Loading and exploring a dataset:

from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
print(df.head())

Training a simple classifier:

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, random_state=0)

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
print(f"Test set score: {knn.score(X_test, y_test):.2f}")

Visualizing decision boundaries:

import numpy as np
import matplotlib.pyplot as plt

x0, x1 = iris.data[:, 0], iris.data[:, 1]
y = iris.target

x0_min, x0_max = x0.min() - 1, x0.max() + 1
x1_min, x1_max = x1.min() - 1, x1.max() + 1
xx, yy = np.meshgrid(np.arange(x0_min, x0_max, .02),
                     np.arange(x1_min, x1_max, .02))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdYlBu)
plt.scatter(x0, x1, c=y, cmap=plt.cm.RdYlBu)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

Getting Started

Clone the repository:

git clone https://github.com/amueller/introduction_to_ml_with_python.git

Install required packages:
```
pip install -r requirements.txt
```
Launch Jupyter Notebook:
```
jupyter notebook
```
Open and run the notebooks in the notebooks directory to explore machine learning concepts and examples.

Competitor Comparisons

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Comprehensive library with a wide range of machine learning algorithms and tools
Highly optimized and efficient implementations for production use
Extensive documentation and community support

Cons of scikit-learn

Steeper learning curve for beginners
Less focus on educational content and explanations
More complex API for some advanced use cases

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

scikit-learn:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

The code examples show that both repositories use similar syntax and structure for basic machine learning tasks. However, scikit-learn offers more advanced options and parameters for fine-tuning models and data processing.

python-machine-learning-book

12,475

The "Python Machine Learning (1st edition)" book code repository and info resource

Pros of python-machine-learning-book

More comprehensive coverage of advanced ML topics
Includes deep learning and neural network concepts
Regular updates and new editions to keep content current

Cons of python-machine-learning-book

May be more challenging for absolute beginners
Less focus on practical, hands-on examples
Requires more background knowledge in mathematics and statistics

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

python-machine-learning-book:

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

Both repositories provide excellent resources for learning machine learning with Python. introduction_to_ml_with_python focuses on practical, beginner-friendly examples using scikit-learn, while python-machine-learning-book offers a more in-depth exploration of ML concepts and implementations.

handson-ml

25,538

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Pros of handson-ml

More comprehensive coverage of advanced ML topics
Includes deep learning and neural networks
Regularly updated with newer ML techniques and libraries

Cons of handson-ml

May be overwhelming for absolute beginners
Requires more prior knowledge of Python and data science concepts
Less focus on foundational ML concepts

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

handson-ml:

import tensorflow as tf
from tensorflow import keras

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="relu", input_shape=[8]),
    keras.layers.Dense(30, activation="relu"),
    keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=keras.optimizers.SGD(learning_rate=1e-3))

The code comparison shows that introduction_to_ml_with_python focuses on traditional ML algorithms using scikit-learn, while handson-ml includes more advanced topics like deep learning with TensorFlow and Keras.

PythonDataScienceHandbook

45,111

Python Data Science Handbook: full text in Jupyter Notebooks

Pros of PythonDataScienceHandbook

Covers a broader range of data science topics, including data manipulation, visualization, and statistics
Provides in-depth explanations and examples for each topic
Includes interactive Jupyter notebooks for hands-on learning

Cons of PythonDataScienceHandbook

Less focused on machine learning algorithms and techniques
May be overwhelming for beginners due to its comprehensive nature
Requires more time investment to work through all the material

Code Comparison

PythonDataScienceHandbook:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.show()

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

The code examples highlight the different focus areas of each repository. PythonDataScienceHandbook emphasizes data visualization and exploration, while introduction_to_ml_with_python concentrates on machine learning algorithms and implementation using scikit-learn.

fastai

27,300

The fastai deep learning library

Pros of fastai

More comprehensive and advanced deep learning library
Offers high-level APIs for quick model development
Includes cutting-edge techniques and best practices

Cons of fastai

Steeper learning curve for beginners
Focused primarily on deep learning, less coverage of traditional ML algorithms
Requires more computational resources for some tasks

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

fastai:

from fastai.vision.all import *

path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

The code examples highlight the difference in focus and complexity between the two libraries. introduction_to_ml_with_python uses scikit-learn for traditional ML tasks, while fastai provides a high-level API for deep learning, particularly in computer vision tasks.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Comprehensive deep learning framework with extensive capabilities
Large ecosystem and community support
Highly scalable for production environments

Cons of TensorFlow

Steeper learning curve for beginners
More complex setup and configuration
Potentially overwhelming for simple ML tasks

Code Comparison

Introduction to ML with Python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

TensorFlow:

import tensorflow as tf
from tensorflow.keras import layers

model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(4,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Introduction to ML with Python focuses on scikit-learn for basic ML concepts, while TensorFlow provides a more advanced framework for deep learning and complex models. The former is more suitable for beginners and simple tasks, while the latter offers greater flexibility and power for advanced applications.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Introduction to Machine Learning with Python

This repository holds the code for the forthcoming book "Introduction to Machine Learning with Python" by Andreas Mueller and Sarah Guido. You can find details about the book on the O'Reilly website.

The book requires the current stable version of scikit-learn, that is 0.20.0. Most of the book can also be used with previous versions of scikit-learn, though you need to adjust the import for everything from the model_selection module, mostly cross_val_score, train_test_split and GridSearchCV.

This repository provides the notebooks from which the book is created, together with the mglearn library of helper functions to create figures and datasets.

For the curious ones, the cover depicts a hellbender.

All datasets are included in the repository, with the exception of the aclImdb dataset, which you can download from the page of Andrew Maas. See the book for details.

If you get ImportError: No module named mglearn you can try to install mglearn into your python environment using the command pip install mglearn in your terminal or !pip install mglearn in Jupyter Notebook.

Errata

Please note that the first print of the book is missing the following line when listing the assumed imports:

from IPython.display import display

Please add this line if you see an error involving display.

The first print of the book used a function called plot_group_kfold. This has been renamed to plot_label_kfold because of a rename in scikit-learn.

Setup

To run the code, you need the packages numpy, scipy, scikit-learn, matplotlib, pandas and pillow. Some of the visualizations of decision trees and neural networks structures also require graphviz. The chapter on text processing also requires nltk and spacy.

The easiest way to set up an environment is by installing Anaconda.

Installing packages with conda:

If you already have a Python environment set up, and you are using the conda package manager, you can get all packages by running

conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz python-graphviz

For the chapter on text processing you also need to install nltk and spacy:

conda install nltk spacy

Installing packages with pip

If you already have a Python environment and are using pip to install packages, you need to run

pip install numpy scipy scikit-learn matplotlib pandas pillow graphviz

You also need to install the graphiz C-library, which is easiest using a package manager. If you are using OS X and homebrew, you can brew install graphviz. If you are on Ubuntu or debian, you can apt-get install graphviz. Installing graphviz on Windows can be tricky and using conda / anaconda is recommended. For the chapter on text processing you also need to install nltk and spacy:

pip install nltk spacy

Downloading English language model

For the text processing chapter, you need to download the English language model for spacy using

python -m spacy download en

Submitting Errata

If you have errata for the (e-)book, please submit them via the O'Reilly Website. You can submit fixes to the code as pull-requests here, but I'd appreciate it if you would also submit them there, as this repository doesn't hold the "master notebooks".

cover

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot