ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

26,965

4,767

26,965

View on GitHub

Top Related Projects

scikit-learn

62,466

scikit-learn: machine learning in Python

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

keras

63,156

Deep Learning for humans

ML-For-Beginners

73,270

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

handson-ml2

29,131

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Quick Overview

ML-From-Scratch is a comprehensive collection of machine learning algorithms implemented from scratch in Python. It aims to provide clear, concise implementations of various ML models, serving as both an educational resource and a practical toolkit for those looking to understand the inner workings of these algorithms.

Pros

Offers a wide range of ML algorithms, from basic to advanced
Implementations are clear and well-documented, making them excellent for learning
Includes visualizations and examples for better understanding
Pure Python implementation without relying on heavy ML libraries

Cons

Not optimized for production use or large-scale datasets
May lack some of the latest cutting-edge algorithms
Implementations might be slower compared to optimized libraries like scikit-learn
Limited support for GPU acceleration

Code Examples

Linear Regression:

from mlfromscratch.supervised_learning import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

K-Means Clustering:

from mlfromscratch.unsupervised_learning import KMeans

kmeans = KMeans(k=3)
kmeans.fit(X)
clusters = kmeans.predict(X)

Neural Network:

from mlfromscratch.deep_learning import NeuralNetwork

model = NeuralNetwork(optimizer="adam", loss="categorical_crossentropy")
model.add(Dense(16, activation="relu", input_shape=(X.shape[1],)))
model.add(Dense(10, activation="softmax"))
model.fit(X_train, y_train, n_epochs=100, batch_size=32)

Getting Started

To get started with ML-From-Scratch:

Clone the repository:

git clone https://github.com/eriklindernoren/ML-From-Scratch.git

Install the required dependencies:
```
pip install -r requirements.txt
```

Import and use the desired algorithm:

from mlfromscratch.supervised_learning import RandomForest

model = RandomForest()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

For more detailed examples and usage instructions, refer to the individual algorithm implementations in the repository.

Competitor Comparisons

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Comprehensive and well-established library with a wide range of machine learning algorithms
Highly optimized for performance, utilizing efficient implementations and Cython
Extensive documentation, community support, and integration with other scientific Python libraries

Cons of scikit-learn

Less suitable for educational purposes or understanding the inner workings of algorithms
More complex codebase, making it harder to contribute or modify for specific needs
Larger dependency footprint and potentially slower import times

Code Comparison

ML-From-Scratch implementation of k-means clustering:

def _calculate_cluster_means(self, X, clusters):
    return np.array([X[clusters == i].mean(axis=0) for i in range(self.k)])

scikit-learn implementation of k-means clustering:

def _kmeans_single_lloyd(X, n_clusters, max_iter=300, init='k-means++',
                         verbose=False, x_squared_norms=None,
                         random_state=None, tol=1e-4,
                         precompute_distances=True):
    # ... (more complex implementation)

ML-From-Scratch focuses on simplicity and readability, while scikit-learn prioritizes performance and robustness.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Highly optimized for performance and scalability
Extensive ecosystem with pre-trained models and tools
Strong industry adoption and community support

Cons of TensorFlow

Steeper learning curve for beginners
More complex setup and configuration
Abstracts away low-level details, which can hinder understanding

Code Comparison

ML-From-Scratch (Neural Network implementation):

class NeuralNetwork():
    def __init__(self, n_hidden, n_features, n_output):
        self.W1 = np.random.randn(n_features, n_hidden)
        self.W2 = np.random.randn(n_hidden, n_output)

    def forward(self, X):
        self.z1 = np.dot(X, self.W1)
        self.a1 = self.sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2)
        return self.sigmoid(self.z2)

TensorFlow (Neural Network implementation):

model = tf.keras.Sequential([
    tf.keras.layers.Dense(n_hidden, activation='sigmoid', input_shape=(n_features,)),
    tf.keras.layers.Dense(n_output, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=100, batch_size=32)

The ML-From-Scratch implementation provides a more transparent view of the neural network's inner workings, while TensorFlow offers a higher-level API that simplifies model creation and training at the cost of some abstraction.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Highly optimized and production-ready framework
Extensive ecosystem with pre-built models and tools
Dynamic computational graphs for flexible model development

Cons of PyTorch

Steeper learning curve for beginners
Larger codebase and more complex architecture
Less focus on educational aspects of machine learning algorithms

Code Comparison

ML-From-Scratch (Neural Network implementation):

class NeuralNetwork():
    def __init__(self, n_hidden, n_features, n_classes):
        self.W = np.random.randn(n_hidden, n_features)
        self.b = np.zeros((n_hidden, 1))
        self.V = np.random.randn(n_classes, n_hidden)
        self.c = np.zeros((n_classes, 1))

PyTorch (Neural Network implementation):

class NeuralNetwork(nn.Module):
    def __init__(self, n_hidden, n_features, n_classes):
        super().__init__()
        self.hidden = nn.Linear(n_features, n_hidden)
        self.output = nn.Linear(n_hidden, n_classes)

    def forward(self, x):
        x = F.relu(self.hidden(x))
        return self.output(x)

ML-From-Scratch focuses on implementing algorithms from scratch, providing a clear understanding of the underlying mathematics. PyTorch, on the other hand, offers a high-level API with built-in optimizations, making it more suitable for large-scale projects and research. While ML-From-Scratch is excellent for learning, PyTorch is the go-to choice for professional development and cutting-edge machine learning applications.

keras

63,156

Deep Learning for humans

Pros of Keras

High-level API for easy and fast prototyping of neural networks
Extensive documentation and large community support
Seamless integration with TensorFlow backend

Cons of Keras

Less flexibility for implementing custom algorithms
Abstraction may hide some low-level details

Code Comparison

ML-From-Scratch implementation of a simple neural network:

class NeuralNetwork():
    def __init__(self, layers):
        self.layers = layers
        self.parameters = self._initialize_parameters()

    def _initialize_parameters(self):
        # Parameter initialization code

Keras implementation of a similar neural network:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(input_dim,)),
    Dense(32, activation='relu'),
    Dense(output_dim, activation='softmax')
])

ML-From-Scratch provides a more detailed, low-level implementation, allowing users to understand the inner workings of neural networks. Keras, on the other hand, offers a more concise and user-friendly approach, abstracting away many implementation details.

While ML-From-Scratch is excellent for learning and understanding machine learning algorithms, Keras is more suitable for practical, production-ready applications due to its efficiency and extensive ecosystem.

ML-For-Beginners

73,270

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Pros of ML-For-Beginners

Comprehensive curriculum structure with lessons, assignments, and quizzes
Covers a wide range of ML topics, including ethics and real-world applications
Includes visualizations and interactive elements to enhance learning

Cons of ML-For-Beginners

Less focus on implementing algorithms from scratch
May not provide as deep an understanding of the mathematical foundations
Primarily uses high-level libraries rather than building core components

Code Comparison

ML-From-Scratch (implementing linear regression):

class LinearRegression():
    def fit(self, X, y):
        X = np.insert(X, 0, 1, axis=1)
        self.theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

    def predict(self, X):
        X = np.insert(X, 0, 1, axis=1)
        return X.dot(self.theta)

ML-For-Beginners (using scikit-learn for linear regression):

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

ML-From-Scratch focuses on implementing algorithms from the ground up, providing a deeper understanding of the underlying mathematics. ML-For-Beginners offers a more structured learning experience with a broader coverage of ML topics, but relies more on existing libraries for implementation.

handson-ml2

29,131

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Pros of handson-ml2

Comprehensive coverage of machine learning concepts with practical examples
Utilizes popular libraries like TensorFlow and Scikit-learn for real-world applications
Includes Jupyter notebooks for interactive learning and experimentation

Cons of handson-ml2

Less focus on implementing algorithms from scratch
May not provide as deep an understanding of the underlying mathematics

Code Comparison

ML-From-Scratch (Neural Network implementation):

class NeuralNetwork():
    def __init__(self, n_hidden, n_iterations=3000, learning_rate=0.01):
        self.n_hidden = n_hidden
        self.n_iterations = n_iterations
        self.learning_rate = learning_rate
        self.hidden_activation = Sigmoid()
        self.output_activation = Sigmoid()

handson-ml2 (Neural Network using TensorFlow):

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="relu", input_shape=[8]),
    keras.layers.Dense(30, activation="relu"),
    keras.layers.Dense(1)
])

The ML-From-Scratch example implements a neural network class from scratch, while handson-ml2 uses Keras (part of TensorFlow) to create a neural network model with pre-built layers and functions.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Machine Learning From Scratch

About

Python implementations of some of the fundamental Machine Learning models and algorithms from scratch.

The purpose of this project is not to produce as optimized and computationally efficient algorithms as possible but rather to present the inner workings of them in a transparent and accessible way.

Machine Learning From Scratch

Installation

$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install

Examples

Polynomial Regression

$ python mlfromscratch/examples/polynomial_regression.py

Figure: Training progress of a regularized polynomial regression model fitting
temperature data measured in LinkÃ¶ping, Sweden 2016.

Classification With CNN

$ python mlfromscratch/examples/convolutional_neural_network.py

+---------+
| ConvNet |
+---------+
Input Shape: (1, 8, 8)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Conv2D               | 160        | (16, 8, 8)   |
| Activation (ReLU)    | 0          | (16, 8, 8)   |
| Dropout              | 0          | (16, 8, 8)   |
| BatchNormalization   | 2048       | (16, 8, 8)   |
| Conv2D               | 4640       | (32, 8, 8)   |
| Activation (ReLU)    | 0          | (32, 8, 8)   |
| Dropout              | 0          | (32, 8, 8)   |
| BatchNormalization   | 4096       | (32, 8, 8)   |
| Flatten              | 0          | (2048,)      |
| Dense                | 524544     | (256,)       |
| Activation (ReLU)    | 0          | (256,)       |
| Dropout              | 0          | (256,)       |
| BatchNormalization   | 512        | (256,)       |
| Dense                | 2570       | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 538570

Training: 100% [------------------------------------------------------------------------] Time: 0:01:55
Accuracy: 0.987465181058

Figure: Classification of the digit dataset using CNN.

Density-Based Clustering

$ python mlfromscratch/examples/dbscan.py

Figure: Clustering of the moons dataset using DBSCAN.

Generating Handwritten Digits

$ python mlfromscratch/unsupervised_learning/generative_adversarial_network.py

+-----------+
| Generator |
+-----------+
Input Shape: (100,)
+------------------------+------------+--------------+
| Layer Type             | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense                  | 25856      | (256,)       |
| Activation (LeakyReLU) | 0          | (256,)       |
| BatchNormalization     | 512        | (256,)       |
| Dense                  | 131584     | (512,)       |
| Activation (LeakyReLU) | 0          | (512,)       |
| BatchNormalization     | 1024       | (512,)       |
| Dense                  | 525312     | (1024,)      |
| Activation (LeakyReLU) | 0          | (1024,)      |
| BatchNormalization     | 2048       | (1024,)      |
| Dense                  | 803600     | (784,)       |
| Activation (TanH)      | 0          | (784,)       |
+------------------------+------------+--------------+
Total Parameters: 1489936

+---------------+
| Discriminator |
+---------------+
Input Shape: (784,)
+------------------------+------------+--------------+
| Layer Type             | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense                  | 401920     | (512,)       |
| Activation (LeakyReLU) | 0          | (512,)       |
| Dropout                | 0          | (512,)       |
| Dense                  | 131328     | (256,)       |
| Activation (LeakyReLU) | 0          | (256,)       |
| Dropout                | 0          | (256,)       |
| Dense                  | 514        | (2,)         |
| Activation (Softmax)   | 0          | (2,)         |
+------------------------+------------+--------------+
Total Parameters: 533762

Figure: Training progress of a Generative Adversarial Network generating
handwritten digits.

Deep Reinforcement Learning

$ python mlfromscratch/examples/deep_q_network.py

+----------------+
| Deep Q-Network |
+----------------+
Input Shape: (4,)
+-------------------+------------+--------------+
| Layer Type        | Parameters | Output Shape |
+-------------------+------------+--------------+
| Dense             | 320        | (64,)        |
| Activation (ReLU) | 0          | (64,)        |
| Dense             | 130        | (2,)         |
+-------------------+------------+--------------+
Total Parameters: 450

Figure: Deep Q-Network solution to the CartPole-v1 environment in OpenAI gym.

Image Reconstruction With RBM

$ python mlfromscratch/examples/restricted_boltzmann_machine.py

Figure: Shows how the network gets better during training at reconstructing
the digit 2 in the MNIST dataset.

Evolutionary Evolved Neural Network

$ python mlfromscratch/examples/neuroevolution.py

+---------------+
| Model Summary |
+---------------+
Input Shape: (64,)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Dense                | 1040       | (16,)        |
| Activation (ReLU)    | 0          | (16,)        |
| Dense                | 170        | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 1210

Population Size: 100
Generations: 3000
Mutation Rate: 0.01

[0 Best Individual - Fitness: 3.08301, Accuracy: 10.5%]
[1 Best Individual - Fitness: 3.08746, Accuracy: 12.0%]
...
[2999 Best Individual - Fitness: 94.08513, Accuracy: 98.5%]
Test set accuracy: 96.7%

Figure: Classification of the digit dataset by a neural network which has
been evolutionary evolved.

Genetic Algorithm

$ python mlfromscratch/examples/genetic_algorithm.py

+--------+
|   GA   |
+--------+
Description: Implementation of a Genetic Algorithm which aims to produce
the user specified target string. This implementation calculates each
candidate's fitness based on the alphabetical distance between the candidate
and the target. A candidate is selected as a parent with probabilities proportional
to the candidate's fitness. Reproduction is implemented as a single-point
crossover between pairs of parents. Mutation is done by randomly assigning
new characters with uniform probability.

Parameters
----------
Target String: 'Genetic Algorithm'
Population Size: 100
Mutation Rate: 0.05

[0 Closest Candidate: 'CJqlJguPlqzvpoJmb', Fitness: 0.00]
[1 Closest Candidate: 'MCxZxdr nlfiwwGEk', Fitness: 0.01]
[2 Closest Candidate: 'MCxZxdm nlfiwwGcx', Fitness: 0.01]
[3 Closest Candidate: 'SmdsAklMHn kBIwKn', Fitness: 0.01]
[4 Closest Candidate: '  lotneaJOasWfu Z', Fitness: 0.01]
...
[292 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[293 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[294 Answer: 'Genetic Algorithm']

Association Analysis

$ python mlfromscratch/examples/apriori.py
+-------------+
|   Apriori   |
+-------------+
Minimum Support: 0.25
Minimum Confidence: 0.8
Transactions:
    [1, 2, 3, 4]
    [1, 2, 4]
    [1, 2]
    [2, 3, 4]
    [2, 3]
    [3, 4]
    [2, 4]
Frequent Itemsets:
    [1, 2, 3, 4, [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [1, 2, 4], [2, 3, 4]]
Rules:
    1 -> 2 (support: 0.43, confidence: 1.0)
    4 -> 2 (support: 0.57, confidence: 0.8)
    [1, 4] -> 2 (support: 0.29, confidence: 1.0)

Implementations

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Deep Q-Network

Deep Learning

Neural Network
Layers
- Activation Layer
- Average Pooling Layer
- Batch Normalization Layer
- Constant Padding Layer
- Convolutional Layer
- Dropout Layer
- Flatten Layer
- Fully-Connected (Dense) Layer
- Fully-Connected RNN Layer
- Max Pooling Layer
- Reshape Layer
- Up Sampling Layer
- Zero Padding Layer
Model Types

Contact

If there's some implementation you would like to see here or if you're just feeling social, feel free to email me or connect with me on LinkedIn.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of scikit-learn

Cons of scikit-learn

Code Comparison

Pros of TensorFlow

Cons of TensorFlow

Code Comparison

Pros of PyTorch

Cons of PyTorch

Code Comparison

Pros of Keras

Cons of Keras

Code Comparison

Pros of ML-For-Beginners

Cons of ML-For-Beginners

Code Comparison

Pros of handson-ml2

Cons of handson-ml2

Code Comparison

Convert designs to code with AI

README

Machine Learning From Scratch

About

Table of Contents

Installation

Examples

Polynomial Regression

Classification With CNN

Density-Based Clustering

Generating Handwritten Digits

Deep Reinforcement Learning

Image Reconstruction With RBM

Evolutionary Evolved Neural Network

Genetic Algorithm

Association Analysis

Implementations

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Deep Learning

Contact

Top Related Projects

Convert designs to code with AI