python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource

12,475

4,418

12,475

View on GitHub

Top Related Projects

scikit-learn

62,466

scikit-learn: machine learning in Python

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

ML-For-Beginners

73,270

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

handson-ml2

29,131

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Quick Overview

The "python-machine-learning-book" repository by Sebastian Raschka contains code examples and resources for his book "Python Machine Learning." It covers various machine learning algorithms, techniques, and their implementation using Python libraries such as scikit-learn, TensorFlow, and PyTorch.

Pros

Comprehensive coverage of machine learning concepts and algorithms
Practical code examples that align with the book content
Regular updates to keep up with the latest Python and library versions
Includes Jupyter notebooks for interactive learning

Cons

May be overwhelming for absolute beginners in machine learning
Some advanced topics might require additional resources for deeper understanding
Requires familiarity with Python programming

Code Examples

Loading and preprocessing data using scikit-learn:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Training a simple neural network using TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)

Implementing a simple linear regression using PyTorch:

import torch
import torch.nn as nn

class LinearRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)

model = LinearRegression()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Getting Started

To get started with the examples in this repository:

Clone the repository:

git clone https://github.com/rasbt/python-machine-learning-book.git

Install the required dependencies:
```
pip install -r requirements.txt
```
Navigate to the desired chapter folder and open the Jupyter notebooks to explore the code examples and explanations.

Competitor Comparisons

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Comprehensive library with a wide range of machine learning algorithms and tools
Well-documented and maintained by a large community of contributors
Seamless integration with other scientific Python libraries like NumPy and pandas

Cons of scikit-learn

Steeper learning curve for beginners due to its extensive functionality
Less focus on in-depth explanations of machine learning concepts
May require additional resources for understanding the theoretical background

Code Comparison

Python Machine Learning Book:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

scikit-learn:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

The code snippets are very similar, as Python Machine Learning Book uses scikit-learn for its examples. The main difference is in the use of the stratify parameter, which ensures proportional representation of classes in the split datasets.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Comprehensive deep learning framework with extensive ecosystem
Highly scalable for large-scale machine learning projects
Strong support for deployment across various platforms

Cons of TensorFlow

Steeper learning curve for beginners
More complex setup and configuration
Can be overkill for simpler machine learning tasks

Code Comparison

Python-Machine-Learning-Book:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

TensorFlow:

import tensorflow as tf
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

Python-Machine-Learning-Book focuses on explaining machine learning concepts with practical examples using scikit-learn and other libraries. It's ideal for beginners and those wanting to understand the fundamentals.

TensorFlow is a powerful, production-ready framework for building and deploying machine learning models, especially deep learning. It offers more advanced features but requires more expertise to use effectively.

keras

63,156

Deep Learning for humans

Pros of Keras

Comprehensive deep learning framework with high-level APIs
Extensive documentation and large community support
Seamless integration with TensorFlow backend

Cons of Keras

Less focus on traditional machine learning algorithms
May be overwhelming for beginners learning basic concepts

Code Comparison

Python Machine Learning Book:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dense(1, activation='sigmoid'))

Summary

Python Machine Learning Book is an educational resource covering various machine learning concepts and algorithms, ideal for beginners and those seeking a comprehensive understanding of ML fundamentals. It provides practical examples using popular libraries like scikit-learn.

Keras, on the other hand, is a powerful deep learning framework focused on neural networks and deep learning applications. It offers high-level APIs for building and training complex models, making it suitable for both research and production environments.

While Python Machine Learning Book serves as a learning tool, Keras is a practical framework for implementing deep learning solutions in real-world scenarios.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Extensive deep learning framework with GPU acceleration
Large community and ecosystem of tools/libraries
Flexible and dynamic computational graph

Cons of PyTorch

Steeper learning curve for beginners
More complex setup and installation process
Primarily focused on deep learning, less suited for traditional ML

Code Comparison

Python Machine Learning Book:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

PyTorch:

import torch
from torch.utils.data import random_split

dataset = YourDataset()
train_size = int(0.7 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

Python Machine Learning Book focuses on scikit-learn and pandas for traditional machine learning tasks, making it more accessible for beginners and those interested in a broad overview of ML concepts. It covers various algorithms and techniques with practical examples.

PyTorch is a powerful deep learning framework that offers more flexibility and control over neural network architectures. It's better suited for advanced users and researchers working on complex deep learning projects, providing tools for building and training neural networks efficiently.

ML-For-Beginners

73,270

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Pros of ML-For-Beginners

More comprehensive curriculum covering various ML topics
Includes hands-on projects and quizzes for practical learning
Regularly updated with contributions from the community

Cons of ML-For-Beginners

Less focus on in-depth mathematical explanations
May not cover advanced ML techniques as extensively
Primarily uses Scikit-learn, limiting exposure to other libraries

Code Comparison

ML-For-Beginners:

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

Python-Machine-Learning-Book:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

handson-ml2

29,131

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Pros of handson-ml2

More comprehensive coverage of deep learning topics
Includes practical exercises and Jupyter notebooks for hands-on learning
Regularly updated with newer machine learning techniques and libraries

Cons of handson-ml2

May be overwhelming for absolute beginners due to its breadth of content
Focuses more on TensorFlow and Keras, with less emphasis on other libraries

Code Comparison

python-machine-learning-book:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1, stratify=y)

handson-ml2:

from sklearn.model_selection import train_test_split
X_train_full, X_test, y_train_full, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=0.2, random_state=42)

Both repositories use similar code for data splitting, but handson-ml2 includes an additional validation set, which is useful for more advanced model evaluation and hyperparameter tuning.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Python Machine Learning book code repository

IMPORTANT NOTE (09/21/2017):

This GitHub repository contains the code examples of the 1st Edition of Python Machine Learning book. If you are looking for the code examples of the 2nd Edition, please refer to this repository instead.

What you can expect are 400 pages rich in useful material just about everything you need to know to get started with machine learning ... from theory to the actual code that you can directly put into action! This is not yet just another "this is how scikit-learn works" book. I aim to explain all the underlying concepts, tell you everything you need to know in terms of best practices and caveats, and we will put those concepts into action mainly using NumPy, scikit-learn, and Theano.

You are not sure if this book is for you? Please checkout the excerpts from the Foreword and Preface, or take a look at the FAQ section for further information.

1st edition, published September 23rd 2015
Paperback: 454 pages
Publisher: Packt Publishing

Language: English
ISBN-10: 1783555130

ISBN-13: 978-1783555130
Kindle ASIN: B00YSILNL0

German ISBN-13: 978-3958454224
Japanese ISBN-13: 978-4844380603
Italian ISBN-13: 978-8850333974
Chinese (traditional) ISBN-13: 978-9864341405
Chinese (mainland) ISBN-13: 978-7111558804
Korean ISBN-13: 979-1187497035
Russian ISBN-13: 978-5970604090

Table of Contents and Code Notebooks

Simply click on the ipynb/nbviewer links next to the chapter headlines to view the code examples (currently, the internal document links are only supported by the NbViewer version). Please note that these are just the code examples accompanying the book, which I uploaded for your convenience; be aware that these notebooks may not be useful without the formulae and descriptive text.

Excerpts from the Foreword and Preface
Instructions for setting up Python and the Jupiter Notebook

Machine Learning - Giving Computers the Ability to Learn from Data [dir] [ipynb] [nbviewer]
Training Machine Learning Algorithms for Classification [dir] [ipynb] [nbviewer]
A Tour of Machine Learning Classifiers Using Scikit-Learn [dir] [ipynb] [nbviewer]
Building Good Training Sets â Data Pre-Processing [dir] [ipynb] [nbviewer]
Compressing Data via Dimensionality Reduction [dir] [ipynb] [nbviewer]
Learning Best Practices for Model Evaluation and Hyperparameter Optimization [dir] [ipynb] [nbviewer]
Combining Different Models for Ensemble Learning [dir] [ipynb] [nbviewer]
Applying Machine Learning to Sentiment Analysis [dir] [ipynb] [nbviewer]
Embedding a Machine Learning Model into a Web Application [dir] [ipynb] [nbviewer]
Predicting Continuous Target Variables with Regression Analysis [dir] [ipynb] [nbviewer]
Working with Unlabeled Data â Clustering Analysis [dir] [ipynb] [nbviewer]
Training Artificial Neural Networks for Image Recognition [dir] [ipynb] [nbviewer]
Parallelizing Neural Network Training via Theano [dir] [ipynb] [nbviewer]

Equation Reference

[PDF] [TEX]

Slides for Teaching

A big thanks to Dmitriy Dligach for sharing his slides from his machine learning course that is currently offered at Loyola University Chicago.

https://github.com/dmitriydligach/PyMLSlides

Additional Math and NumPy Resources

Some readers were asking about Math and NumPy primers, since they were not included due to length limitations. However, I recently put together such resources for another book, but I made these chapters freely available online in hope that they also serve as helpful background material for this book:

Algebra Basics [PDF] [EPUB]
A Calculus and Differentiation Primer [PDF] [EPUB]
Introduction to NumPy [PDF] [EPUB] [Code Notebook]

Citing this Book

You are very welcome to re-use the code snippets or other contents from this book in scientific publications and other works; in this case, I would appreciate citations to the original source:

BibTeX:

@Book{raschka2015python,
 author = {Raschka, Sebastian},
 title = {Python Machine Learning},
 publisher = {Packt Publishing},
 year = {2015},
 address = {Birmingham, UK},
 isbn = {1783555130}
 }

MLA:

Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.

Feedback & Reviews

Short review snippets

Sebastian Raschkaâs new book, Python Machine Learning, has just been released. I got a chance to read a review copy and itâs just as I expected - really great! Itâs well organized, super easy to follow, and it not only offers a good foundation for smart, non-experts, practitioners will get some ideas and learn new tricks here as well.
â Lon Riesberg at Data Elixir

Superb job! Thus far, for me it seems to have hit the right balance of theory and practiceâ¦math and code!
â Brian Thomas

I've read (virtually) every Machine Learning title based around Scikit-learn and this is hands-down the best one out there.
â Jason Wolosonovich

The best book I've seen to come out of PACKT Publishing. This is a very well written introduction to machine learning with Python. As others have noted, a perfect mixture of theory and application.
â Josh D.

A book with a blend of qualities that is hard to come by: combines the needed mathematics to control the theory with the applied coding in Python. Also great to see it doesn't waste paper in giving a primer on Python as many other books do just to appeal to the greater audience. You can tell it's been written by knowledgeable writers and not just DIY geeks.
â Amazon Customer

Sebastian Raschka created an amazing machine learning tutorial which combines theory with practice. The book explains machine learning from a theoretical perspective and has tons of coded examples to show how you would actually use the machine learning technique. It can be read by a beginner or advanced programmer.

William P. Ross, 7 Must Read Python Books

Longer reviews

If you need help to decide whether this book is for you, check out some of the "longer" reviews linked below. (If you wrote a review, please let me know, and I'd be happy to add it to the list).

Python Machine Learning Review by Patrick Hill at the Chartered Institute for IT
Book Review: Python Machine Learning by Sebastian Raschka by Alex Turner at WhatPixel

Links

ebook and paperback at Amazon.com, Amazon.co.uk, Amazon.de
ebook and paperback from Packt (the publisher)
at other book stores: Google Books, O'Reilly, Safari, Barnes & Noble, Apple iBooks, ...
social platforms: Goodreads

Translations

Italian translation via "Apogeo"
German translation via "mitp Verlag"
Japanese translation via "Impress Top Gear"
Chinese translation (traditional Chinese)
Chinese translation (simple Chinese)
Korean translation via "Kyobo"
Polish translation via "Helion"

Literature References & Further Reading Resources

Errata

Bonus Notebooks (not in the book)

Logistic Regression Implementation [dir] [ipynb] [nbviewer]
A Basic Pipeline and Grid Search Setup [dir] [ipynb] [nbviewer]
An Extended Nested Cross-Validation Example [dir] [ipynb] [nbviewer]
A Simple Barebones Flask Webapp Template [view directory][download as zip-file]
Reading handwritten digits from MNIST into NumPy arrays [GitHub ipynb] [nbviewer]
Scikit-learn Model Persistence using JSON [GitHub ipynb] [nbviewer]
Multinomial logistic regression / softmax regression [GitHub ipynb] [nbviewer]

"Related Content" (not in the book)

SciPy 2016

We had such a great time at SciPy 2016 in Austin! It was a real pleasure to meet and chat with so many readers of my book. Thanks so much for all the nice words and feedback! And in case you missed it, Andreas Mueller and I gave an Introduction to Machine Learning with Scikit-learn; if you are interested, the video recordings of Part I and Part II are now online!

PyData Chicago 2016

I attempted the rather challenging task of introducing scikit-learn & machine learning in just 90 minutes at PyData Chicago 2016. The slides and tutorial material are available at "Learning scikit-learn -- An Introduction to Machine Learning in Python."

Note

I have set up a separate library, mlxtend, containing additional implementations of machine learning (and general "data science") algorithms. I also added implementations from this book (for example, the decision region plot, the artificial neural network, and sequential feature selection algorithms) with additional functionality.

Translations

Dear readers,
first of all, I want to thank all of you for the great support! I am really happy about all the great feedback you sent me so far, and I am glad that the book has been so useful to a broad audience.

Over the last couple of months, I received hundreds of emails, and I tried to answer as many as possible in the available time I have. To make them useful to other readers as well, I collected many of my answers in the FAQ section (below).

In addition, some of you asked me about a platform for readers to discuss the contents of the book. I hope that this would provide an opportunity for you to discuss and share your knowledge with other readers:

Google Groups Discussion Board

(And I will try my best to answer questions myself if time allows! :))

The only thing to do with good advice is to pass it on. It is never of any use to oneself.
â Oscar Wilde

Examples and Applications by Readers

Once again, I have to say (big!) THANKS for all the nice feedback about the book. I've received many emails from readers, who put the concepts and examples from this book out into the real world and make good use of them in their projects. In this section, I am starting to gather some of these great applications, and I'd be more than happy to add your project to this list -- just shoot me a quick mail!

FAQ

General Questions

Questions about the Machine Learning Field

Questions about ML Concepts and Statistics

Cost Functions and Optimization

Regression Analysis

What is the difference between Pearson R and Simple Linear Regression?

Tree models

Model evaluation

Logistic Regression

Neural Networks and Deep Learning

Preprocessing, Feature Selection and Extraction

Naive Bayes

Other

Programming Languages and Libraries for Data Science and Machine Learning

Questions about the Book

Contact

I am happy to answer questions! Just write me an email or consider asking the question on the Google Groups Email List.

If you are interested in keeping in touch, I have quite a lively twitter stream (@rasbt) all about data science and machine learning. I also maintain a blog where I post all of the things I am particularly excited about.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of scikit-learn

Cons of scikit-learn

Code Comparison

Pros of TensorFlow

Cons of TensorFlow

Code Comparison

Pros of Keras

Cons of Keras

Code Comparison

Summary

Pros of PyTorch

Cons of PyTorch

Code Comparison

Pros of ML-For-Beginners

Cons of ML-For-Beginners

Code Comparison

Pros of handson-ml2

Cons of handson-ml2

Code Comparison

Convert designs to code with AI

README

Python Machine Learning book code repository

IMPORTANT NOTE (09/21/2017):

Table of Contents and Code Notebooks

Equation Reference

Slides for Teaching

Additional Math and NumPy Resources

Citing this Book

Longer reviews

Links

Translations

Bonus Notebooks (not in the book)

SciPy 2016

PyData Chicago 2016

Translations

Examples and Applications by Readers

FAQ

General Questions

Questions about the Machine Learning Field

Questions about ML Concepts and Statistics

Cost Functions and Optimization

Regression Analysis

Tree models

Model evaluation

Logistic Regression

Neural Networks and Deep Learning

Other Algorithms for Supervised Learning

Unsupervised Learning

Semi-Supervised Learning

Ensemble Methods

Preprocessing, Feature Selection and Extraction

Naive Bayes

Other

Programming Languages and Libraries for Data Science and Machine Learning

Questions about the Book

Contact

Top Related Projects

Convert designs to code with AI