Convert Figma logo to code with AI

eriklindernoren logoML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

23,910
4,585
23,910
51

Top Related Projects

scikit-learn: machine learning in Python

186,879

An Open Source Machine Learning Framework for Everyone

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

62,199

Deep Learning for humans

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Quick Overview

ML-From-Scratch is a comprehensive collection of machine learning algorithms implemented from scratch in Python. It aims to provide clear, concise implementations of various ML models, serving as both an educational resource and a practical toolkit for those looking to understand the inner workings of these algorithms.

Pros

  • Offers a wide range of ML algorithms, from basic to advanced
  • Implementations are clear and well-documented, making them excellent for learning
  • Includes visualizations and examples for better understanding
  • Pure Python implementation without relying on heavy ML libraries

Cons

  • Not optimized for production use or large-scale datasets
  • May lack some of the latest cutting-edge algorithms
  • Implementations might be slower compared to optimized libraries like scikit-learn
  • Limited support for GPU acceleration

Code Examples

  1. Linear Regression:
from mlfromscratch.supervised_learning import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
  1. K-Means Clustering:
from mlfromscratch.unsupervised_learning import KMeans

kmeans = KMeans(k=3)
kmeans.fit(X)
clusters = kmeans.predict(X)
  1. Neural Network:
from mlfromscratch.deep_learning import NeuralNetwork

model = NeuralNetwork(optimizer="adam", loss="categorical_crossentropy")
model.add(Dense(16, activation="relu", input_shape=(X.shape[1],)))
model.add(Dense(10, activation="softmax"))
model.fit(X_train, y_train, n_epochs=100, batch_size=32)

Getting Started

To get started with ML-From-Scratch:

  1. Clone the repository:

    git clone https://github.com/eriklindernoren/ML-From-Scratch.git
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Import and use the desired algorithm:

    from mlfromscratch.supervised_learning import RandomForest
    
    model = RandomForest()
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    

For more detailed examples and usage instructions, refer to the individual algorithm implementations in the repository.

Competitor Comparisons

scikit-learn: machine learning in Python

Pros of scikit-learn

  • Comprehensive and well-established library with a wide range of machine learning algorithms
  • Highly optimized for performance, utilizing efficient implementations and Cython
  • Extensive documentation, community support, and integration with other scientific Python libraries

Cons of scikit-learn

  • Less suitable for educational purposes or understanding the inner workings of algorithms
  • More complex codebase, making it harder to contribute or modify for specific needs
  • Larger dependency footprint and potentially slower import times

Code Comparison

ML-From-Scratch implementation of k-means clustering:

def _calculate_cluster_means(self, X, clusters):
    return np.array([X[clusters == i].mean(axis=0) for i in range(self.k)])

scikit-learn implementation of k-means clustering:

def _kmeans_single_lloyd(X, n_clusters, max_iter=300, init='k-means++',
                         verbose=False, x_squared_norms=None,
                         random_state=None, tol=1e-4,
                         precompute_distances=True):
    # ... (more complex implementation)

ML-From-Scratch focuses on simplicity and readability, while scikit-learn prioritizes performance and robustness.

186,879

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Highly optimized for performance and scalability
  • Extensive ecosystem with pre-trained models and tools
  • Strong industry adoption and community support

Cons of TensorFlow

  • Steeper learning curve for beginners
  • More complex setup and configuration
  • Abstracts away low-level details, which can hinder understanding

Code Comparison

ML-From-Scratch (Neural Network implementation):

class NeuralNetwork():
    def __init__(self, n_hidden, n_features, n_output):
        self.W1 = np.random.randn(n_features, n_hidden)
        self.W2 = np.random.randn(n_hidden, n_output)

    def forward(self, X):
        self.z1 = np.dot(X, self.W1)
        self.a1 = self.sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2)
        return self.sigmoid(self.z2)

TensorFlow (Neural Network implementation):

model = tf.keras.Sequential([
    tf.keras.layers.Dense(n_hidden, activation='sigmoid', input_shape=(n_features,)),
    tf.keras.layers.Dense(n_output, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=100, batch_size=32)

The ML-From-Scratch implementation provides a more transparent view of the neural network's inner workings, while TensorFlow offers a higher-level API that simplifies model creation and training at the cost of some abstraction.

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Highly optimized and production-ready framework
  • Extensive ecosystem with pre-built models and tools
  • Dynamic computational graphs for flexible model development

Cons of PyTorch

  • Steeper learning curve for beginners
  • Larger codebase and more complex architecture
  • Less focus on educational aspects of machine learning algorithms

Code Comparison

ML-From-Scratch (Neural Network implementation):

class NeuralNetwork():
    def __init__(self, n_hidden, n_features, n_classes):
        self.W = np.random.randn(n_hidden, n_features)
        self.b = np.zeros((n_hidden, 1))
        self.V = np.random.randn(n_classes, n_hidden)
        self.c = np.zeros((n_classes, 1))

PyTorch (Neural Network implementation):

class NeuralNetwork(nn.Module):
    def __init__(self, n_hidden, n_features, n_classes):
        super().__init__()
        self.hidden = nn.Linear(n_features, n_hidden)
        self.output = nn.Linear(n_hidden, n_classes)

    def forward(self, x):
        x = F.relu(self.hidden(x))
        return self.output(x)

ML-From-Scratch focuses on implementing algorithms from scratch, providing a clear understanding of the underlying mathematics. PyTorch, on the other hand, offers a high-level API with built-in optimizations, making it more suitable for large-scale projects and research. While ML-From-Scratch is excellent for learning, PyTorch is the go-to choice for professional development and cutting-edge machine learning applications.

62,199

Deep Learning for humans

Pros of Keras

  • High-level API for easy and fast prototyping of neural networks
  • Extensive documentation and large community support
  • Seamless integration with TensorFlow backend

Cons of Keras

  • Less flexibility for implementing custom algorithms
  • Abstraction may hide some low-level details

Code Comparison

ML-From-Scratch implementation of a simple neural network:

class NeuralNetwork():
    def __init__(self, layers):
        self.layers = layers
        self.parameters = self._initialize_parameters()

    def _initialize_parameters(self):
        # Parameter initialization code

Keras implementation of a similar neural network:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(input_dim,)),
    Dense(32, activation='relu'),
    Dense(output_dim, activation='softmax')
])

ML-From-Scratch provides a more detailed, low-level implementation, allowing users to understand the inner workings of neural networks. Keras, on the other hand, offers a more concise and user-friendly approach, abstracting away many implementation details.

While ML-From-Scratch is excellent for learning and understanding machine learning algorithms, Keras is more suitable for practical, production-ready applications due to its efficiency and extensive ecosystem.

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Pros of ML-For-Beginners

  • Comprehensive curriculum structure with lessons, assignments, and quizzes
  • Covers a wide range of ML topics, including ethics and real-world applications
  • Includes visualizations and interactive elements to enhance learning

Cons of ML-For-Beginners

  • Less focus on implementing algorithms from scratch
  • May not provide as deep an understanding of the mathematical foundations
  • Primarily uses high-level libraries rather than building core components

Code Comparison

ML-From-Scratch (implementing linear regression):

class LinearRegression():
    def fit(self, X, y):
        X = np.insert(X, 0, 1, axis=1)
        self.theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

    def predict(self, X):
        X = np.insert(X, 0, 1, axis=1)
        return X.dot(self.theta)

ML-For-Beginners (using scikit-learn for linear regression):

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

ML-From-Scratch focuses on implementing algorithms from the ground up, providing a deeper understanding of the underlying mathematics. ML-For-Beginners offers a more structured learning experience with a broader coverage of ML topics, but relies more on existing libraries for implementation.

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Pros of handson-ml2

  • Comprehensive coverage of machine learning concepts with practical examples
  • Utilizes popular libraries like TensorFlow and Scikit-learn for real-world applications
  • Includes Jupyter notebooks for interactive learning and experimentation

Cons of handson-ml2

  • Less focus on implementing algorithms from scratch
  • May not provide as deep an understanding of the underlying mathematics

Code Comparison

ML-From-Scratch (Neural Network implementation):

class NeuralNetwork():
    def __init__(self, n_hidden, n_iterations=3000, learning_rate=0.01):
        self.n_hidden = n_hidden
        self.n_iterations = n_iterations
        self.learning_rate = learning_rate
        self.hidden_activation = Sigmoid()
        self.output_activation = Sigmoid()

handson-ml2 (Neural Network using TensorFlow):

model = keras.models.Sequential([
    keras.layers.Dense(30, activation="relu", input_shape=[8]),
    keras.layers.Dense(30, activation="relu"),
    keras.layers.Dense(1)
])

The ML-From-Scratch example implements a neural network class from scratch, while handson-ml2 uses Keras (part of TensorFlow) to create a neural network model with pre-built layers and functions.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Machine Learning From Scratch

About

Python implementations of some of the fundamental Machine Learning models and algorithms from scratch.

The purpose of this project is not to produce as optimized and computationally efficient algorithms as possible but rather to present the inner workings of them in a transparent and accessible way.

Table of Contents

Installation

$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install

Examples

Polynomial Regression

$ python mlfromscratch/examples/polynomial_regression.py

Figure: Training progress of a regularized polynomial regression model fitting
temperature data measured in Linköping, Sweden 2016.

Classification With CNN

$ python mlfromscratch/examples/convolutional_neural_network.py

+---------+
| ConvNet |
+---------+
Input Shape: (1, 8, 8)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Conv2D               | 160        | (16, 8, 8)   |
| Activation (ReLU)    | 0          | (16, 8, 8)   |
| Dropout              | 0          | (16, 8, 8)   |
| BatchNormalization   | 2048       | (16, 8, 8)   |
| Conv2D               | 4640       | (32, 8, 8)   |
| Activation (ReLU)    | 0          | (32, 8, 8)   |
| Dropout              | 0          | (32, 8, 8)   |
| BatchNormalization   | 4096       | (32, 8, 8)   |
| Flatten              | 0          | (2048,)      |
| Dense                | 524544     | (256,)       |
| Activation (ReLU)    | 0          | (256,)       |
| Dropout              | 0          | (256,)       |
| BatchNormalization   | 512        | (256,)       |
| Dense                | 2570       | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 538570

Training: 100% [------------------------------------------------------------------------] Time: 0:01:55
Accuracy: 0.987465181058

Figure: Classification of the digit dataset using CNN.

Density-Based Clustering

$ python mlfromscratch/examples/dbscan.py

Figure: Clustering of the moons dataset using DBSCAN.

Generating Handwritten Digits

$ python mlfromscratch/unsupervised_learning/generative_adversarial_network.py

+-----------+
| Generator |
+-----------+
Input Shape: (100,)
+------------------------+------------+--------------+
| Layer Type             | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense                  | 25856      | (256,)       |
| Activation (LeakyReLU) | 0          | (256,)       |
| BatchNormalization     | 512        | (256,)       |
| Dense                  | 131584     | (512,)       |
| Activation (LeakyReLU) | 0          | (512,)       |
| BatchNormalization     | 1024       | (512,)       |
| Dense                  | 525312     | (1024,)      |
| Activation (LeakyReLU) | 0          | (1024,)      |
| BatchNormalization     | 2048       | (1024,)      |
| Dense                  | 803600     | (784,)       |
| Activation (TanH)      | 0          | (784,)       |
+------------------------+------------+--------------+
Total Parameters: 1489936

+---------------+
| Discriminator |
+---------------+
Input Shape: (784,)
+------------------------+------------+--------------+
| Layer Type             | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense                  | 401920     | (512,)       |
| Activation (LeakyReLU) | 0          | (512,)       |
| Dropout                | 0          | (512,)       |
| Dense                  | 131328     | (256,)       |
| Activation (LeakyReLU) | 0          | (256,)       |
| Dropout                | 0          | (256,)       |
| Dense                  | 514        | (2,)         |
| Activation (Softmax)   | 0          | (2,)         |
+------------------------+------------+--------------+
Total Parameters: 533762

Figure: Training progress of a Generative Adversarial Network generating
handwritten digits.

Deep Reinforcement Learning

$ python mlfromscratch/examples/deep_q_network.py

+----------------+
| Deep Q-Network |
+----------------+
Input Shape: (4,)
+-------------------+------------+--------------+
| Layer Type        | Parameters | Output Shape |
+-------------------+------------+--------------+
| Dense             | 320        | (64,)        |
| Activation (ReLU) | 0          | (64,)        |
| Dense             | 130        | (2,)         |
+-------------------+------------+--------------+
Total Parameters: 450

Figure: Deep Q-Network solution to the CartPole-v1 environment in OpenAI gym.

Image Reconstruction With RBM

$ python mlfromscratch/examples/restricted_boltzmann_machine.py

Figure: Shows how the network gets better during training at reconstructing
the digit 2 in the MNIST dataset.

Evolutionary Evolved Neural Network

$ python mlfromscratch/examples/neuroevolution.py

+---------------+
| Model Summary |
+---------------+
Input Shape: (64,)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Dense                | 1040       | (16,)        |
| Activation (ReLU)    | 0          | (16,)        |
| Dense                | 170        | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 1210

Population Size: 100
Generations: 3000
Mutation Rate: 0.01

[0 Best Individual - Fitness: 3.08301, Accuracy: 10.5%]
[1 Best Individual - Fitness: 3.08746, Accuracy: 12.0%]
...
[2999 Best Individual - Fitness: 94.08513, Accuracy: 98.5%]
Test set accuracy: 96.7%

Figure: Classification of the digit dataset by a neural network which has
been evolutionary evolved.

Genetic Algorithm

$ python mlfromscratch/examples/genetic_algorithm.py

+--------+
|   GA   |
+--------+
Description: Implementation of a Genetic Algorithm which aims to produce
the user specified target string. This implementation calculates each
candidate's fitness based on the alphabetical distance between the candidate
and the target. A candidate is selected as a parent with probabilities proportional
to the candidate's fitness. Reproduction is implemented as a single-point
crossover between pairs of parents. Mutation is done by randomly assigning
new characters with uniform probability.

Parameters
----------
Target String: 'Genetic Algorithm'
Population Size: 100
Mutation Rate: 0.05

[0 Closest Candidate: 'CJqlJguPlqzvpoJmb', Fitness: 0.00]
[1 Closest Candidate: 'MCxZxdr nlfiwwGEk', Fitness: 0.01]
[2 Closest Candidate: 'MCxZxdm nlfiwwGcx', Fitness: 0.01]
[3 Closest Candidate: 'SmdsAklMHn kBIwKn', Fitness: 0.01]
[4 Closest Candidate: '  lotneaJOasWfu Z', Fitness: 0.01]
...
[292 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[293 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[294 Answer: 'Genetic Algorithm']

Association Analysis

$ python mlfromscratch/examples/apriori.py
+-------------+
|   Apriori   |
+-------------+
Minimum Support: 0.25
Minimum Confidence: 0.8
Transactions:
    [1, 2, 3, 4]
    [1, 2, 4]
    [1, 2]
    [2, 3, 4]
    [2, 3]
    [3, 4]
    [2, 4]
Frequent Itemsets:
    [1, 2, 3, 4, [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [1, 2, 4], [2, 3, 4]]
Rules:
    1 -> 2 (support: 0.43, confidence: 1.0)
    4 -> 2 (support: 0.57, confidence: 0.8)
    [1, 4] -> 2 (support: 0.29, confidence: 1.0)

Implementations

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Deep Learning

Contact

If there's some implementation you would like to see here or if you're just feeling social, feel free to email me or connect with me on LinkedIn.