Convert Figma logo to code with AI

amueller logointroduction_to_ml_with_python

Notebooks and code for the book "Introduction to Machine Learning with Python"

7,416
4,555
7,416
24

Top Related Projects

scikit-learn: machine learning in Python

The "Python Machine Learning (1st edition)" book code repository and info resource

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Python Data Science Handbook: full text in Jupyter Notebooks

26,197

The fastai deep learning library

185,446

An Open Source Machine Learning Framework for Everyone

Quick Overview

"Introduction to Machine Learning with Python" is a GitHub repository accompanying the book of the same name by Andreas Müller and Sarah Guido. It contains Jupyter notebooks and Python scripts that demonstrate various machine learning concepts and techniques using popular libraries like scikit-learn, NumPy, and pandas.

Pros

  • Comprehensive coverage of machine learning topics, from basic concepts to advanced techniques
  • Practical examples and real-world datasets for hands-on learning
  • Well-structured code and explanations that align with the book's content
  • Regular updates to keep pace with evolving libraries and best practices

Cons

  • Some examples may become outdated as libraries evolve
  • Requires prior knowledge of Python programming
  • May not cover the latest cutting-edge machine learning techniques
  • Limited focus on deep learning compared to traditional machine learning

Code Examples

  1. Loading and exploring a dataset:
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
print(df.head())
  1. Training a simple classifier:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, random_state=0)

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
print(f"Test set score: {knn.score(X_test, y_test):.2f}")
  1. Visualizing decision boundaries:
import numpy as np
import matplotlib.pyplot as plt

x0, x1 = iris.data[:, 0], iris.data[:, 1]
y = iris.target

x0_min, x0_max = x0.min() - 1, x0.max() + 1
x1_min, x1_max = x1.min() - 1, x1.max() + 1
xx, yy = np.meshgrid(np.arange(x0_min, x0_max, .02),
                     np.arange(x1_min, x1_max, .02))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdYlBu)
plt.scatter(x0, x1, c=y, cmap=plt.cm.RdYlBu)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.show()

Getting Started

  1. Clone the repository:

    git clone https://github.com/amueller/introduction_to_ml_with_python.git
    
  2. Install required packages:

    pip install -r requirements.txt
    
  3. Launch Jupyter Notebook:

    jupyter notebook
    
  4. Open and run the notebooks in the notebooks directory to explore machine learning concepts and examples.

Competitor Comparisons

scikit-learn: machine learning in Python

Pros of scikit-learn

  • Comprehensive library with a wide range of machine learning algorithms and tools
  • Highly optimized and efficient implementations for production use
  • Extensive documentation and community support

Cons of scikit-learn

  • Steeper learning curve for beginners
  • Less focus on educational content and explanations
  • More complex API for some advanced use cases

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

scikit-learn:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

The code examples show that both repositories use similar syntax and structure for basic machine learning tasks. However, scikit-learn offers more advanced options and parameters for fine-tuning models and data processing.

The "Python Machine Learning (1st edition)" book code repository and info resource

Pros of python-machine-learning-book

  • More comprehensive coverage of advanced ML topics
  • Includes deep learning and neural network concepts
  • Regular updates and new editions to keep content current

Cons of python-machine-learning-book

  • May be more challenging for absolute beginners
  • Less focus on practical, hands-on examples
  • Requires more background knowledge in mathematics and statistics

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

python-machine-learning-book:

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

Both repositories provide excellent resources for learning machine learning with Python. introduction_to_ml_with_python focuses on practical, beginner-friendly examples using scikit-learn, while python-machine-learning-book offers a more in-depth exploration of ML concepts and implementations.

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Pros of handson-ml

  • More comprehensive coverage of advanced ML topics
  • Includes deep learning and neural networks
  • Regularly updated with newer ML techniques and libraries

Cons of handson-ml

  • May be overwhelming for absolute beginners
  • Requires more prior knowledge of Python and data science concepts
  • Less focus on foundational ML concepts

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

handson-ml:

import tensorflow as tf
from tensorflow import keras

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="relu", input_shape=[8]),
    keras.layers.Dense(30, activation="relu"),
    keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=keras.optimizers.SGD(learning_rate=1e-3))

The code comparison shows that introduction_to_ml_with_python focuses on traditional ML algorithms using scikit-learn, while handson-ml includes more advanced topics like deep learning with TensorFlow and Keras.

Python Data Science Handbook: full text in Jupyter Notebooks

Pros of PythonDataScienceHandbook

  • Covers a broader range of data science topics, including data manipulation, visualization, and statistics
  • Provides in-depth explanations and examples for each topic
  • Includes interactive Jupyter notebooks for hands-on learning

Cons of PythonDataScienceHandbook

  • Less focused on machine learning algorithms and techniques
  • May be overwhelming for beginners due to its comprehensive nature
  • Requires more time investment to work through all the material

Code Comparison

PythonDataScienceHandbook:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.show()

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

The code examples highlight the different focus areas of each repository. PythonDataScienceHandbook emphasizes data visualization and exploration, while introduction_to_ml_with_python concentrates on machine learning algorithms and implementation using scikit-learn.

26,197

The fastai deep learning library

Pros of fastai

  • More comprehensive and advanced deep learning library
  • Offers high-level APIs for quick model development
  • Includes cutting-edge techniques and best practices

Cons of fastai

  • Steeper learning curve for beginners
  • Focused primarily on deep learning, less coverage of traditional ML algorithms
  • Requires more computational resources for some tasks

Code Comparison

introduction_to_ml_with_python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

fastai:

from fastai.vision.all import *

path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

The code examples highlight the difference in focus and complexity between the two libraries. introduction_to_ml_with_python uses scikit-learn for traditional ML tasks, while fastai provides a high-level API for deep learning, particularly in computer vision tasks.

185,446

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Comprehensive deep learning framework with extensive capabilities
  • Large ecosystem and community support
  • Highly scalable for production environments

Cons of TensorFlow

  • Steeper learning curve for beginners
  • More complex setup and configuration
  • Potentially overwhelming for simple ML tasks

Code Comparison

Introduction to ML with Python:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

TensorFlow:

import tensorflow as tf
from tensorflow.keras import layers

model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(4,)),
    layers.Dense(64, activation='relu'),
    layers.Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Introduction to ML with Python focuses on scikit-learn for basic ML concepts, while TensorFlow provides a more advanced framework for deep learning and complex models. The former is more suitable for beginners and simple tasks, while the latter offers greater flexibility and power for advanced applications.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Binder

Introduction to Machine Learning with Python

This repository holds the code for the forthcoming book "Introduction to Machine Learning with Python" by Andreas Mueller and Sarah Guido. You can find details about the book on the O'Reilly website.

The book requires the current stable version of scikit-learn, that is 0.20.0. Most of the book can also be used with previous versions of scikit-learn, though you need to adjust the import for everything from the model_selection module, mostly cross_val_score, train_test_split and GridSearchCV.

This repository provides the notebooks from which the book is created, together with the mglearn library of helper functions to create figures and datasets.

For the curious ones, the cover depicts a hellbender.

All datasets are included in the repository, with the exception of the aclImdb dataset, which you can download from the page of Andrew Maas. See the book for details.

If you get ImportError: No module named mglearn you can try to install mglearn into your python environment using the command pip install mglearn in your terminal or !pip install mglearn in Jupyter Notebook.

Errata

Please note that the first print of the book is missing the following line when listing the assumed imports:

from IPython.display import display

Please add this line if you see an error involving display.

The first print of the book used a function called plot_group_kfold. This has been renamed to plot_label_kfold because of a rename in scikit-learn.

Setup

To run the code, you need the packages numpy, scipy, scikit-learn, matplotlib, pandas and pillow. Some of the visualizations of decision trees and neural networks structures also require graphviz. The chapter on text processing also requires nltk and spacy.

The easiest way to set up an environment is by installing Anaconda.

Installing packages with conda:

If you already have a Python environment set up, and you are using the conda package manager, you can get all packages by running

conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz python-graphviz

For the chapter on text processing you also need to install nltk and spacy:

conda install nltk spacy

Installing packages with pip

If you already have a Python environment and are using pip to install packages, you need to run

pip install numpy scipy scikit-learn matplotlib pandas pillow graphviz

You also need to install the graphiz C-library, which is easiest using a package manager. If you are using OS X and homebrew, you can brew install graphviz. If you are on Ubuntu or debian, you can apt-get install graphviz. Installing graphviz on Windows can be tricky and using conda / anaconda is recommended. For the chapter on text processing you also need to install nltk and spacy:

pip install nltk spacy

Downloading English language model

For the text processing chapter, you need to download the English language model for spacy using

python -m spacy download en

Submitting Errata

If you have errata for the (e-)book, please submit them via the O'Reilly Website. You can submit fixes to the code as pull-requests here, but I'd appreciate it if you would also submit them there, as this repository doesn't hold the "master notebooks".

cover