ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Top Related Projects
scikit-learn: machine learning in Python
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Deep Learning for humans
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Quick Overview
ML-From-Scratch is a comprehensive collection of machine learning algorithms implemented from scratch in Python. It aims to provide clear, concise implementations of various ML models, serving as both an educational resource and a practical toolkit for those looking to understand the inner workings of these algorithms.
Pros
- Offers a wide range of ML algorithms, from basic to advanced
- Implementations are clear and well-documented, making them excellent for learning
- Includes visualizations and examples for better understanding
- Pure Python implementation without relying on heavy ML libraries
Cons
- Not optimized for production use or large-scale datasets
- May lack some of the latest cutting-edge algorithms
- Implementations might be slower compared to optimized libraries like scikit-learn
- Limited support for GPU acceleration
Code Examples
- Linear Regression:
from mlfromscratch.supervised_learning import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
- K-Means Clustering:
from mlfromscratch.unsupervised_learning import KMeans
kmeans = KMeans(k=3)
kmeans.fit(X)
clusters = kmeans.predict(X)
- Neural Network:
from mlfromscratch.deep_learning import NeuralNetwork
model = NeuralNetwork(optimizer="adam", loss="categorical_crossentropy")
model.add(Dense(16, activation="relu", input_shape=(X.shape[1],)))
model.add(Dense(10, activation="softmax"))
model.fit(X_train, y_train, n_epochs=100, batch_size=32)
Getting Started
To get started with ML-From-Scratch:
-
Clone the repository:
git clone https://github.com/eriklindernoren/ML-From-Scratch.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Import and use the desired algorithm:
from mlfromscratch.supervised_learning import RandomForest model = RandomForest() model.fit(X_train, y_train) predictions = model.predict(X_test)
For more detailed examples and usage instructions, refer to the individual algorithm implementations in the repository.
Competitor Comparisons
scikit-learn: machine learning in Python
Pros of scikit-learn
- Comprehensive and well-established library with a wide range of machine learning algorithms
- Highly optimized for performance, utilizing efficient implementations and Cython
- Extensive documentation, community support, and integration with other scientific Python libraries
Cons of scikit-learn
- Less suitable for educational purposes or understanding the inner workings of algorithms
- More complex codebase, making it harder to contribute or modify for specific needs
- Larger dependency footprint and potentially slower import times
Code Comparison
ML-From-Scratch implementation of k-means clustering:
def _calculate_cluster_means(self, X, clusters):
return np.array([X[clusters == i].mean(axis=0) for i in range(self.k)])
scikit-learn implementation of k-means clustering:
def _kmeans_single_lloyd(X, n_clusters, max_iter=300, init='k-means++',
verbose=False, x_squared_norms=None,
random_state=None, tol=1e-4,
precompute_distances=True):
# ... (more complex implementation)
ML-From-Scratch focuses on simplicity and readability, while scikit-learn prioritizes performance and robustness.
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Highly optimized for performance and scalability
- Extensive ecosystem with pre-trained models and tools
- Strong industry adoption and community support
Cons of TensorFlow
- Steeper learning curve for beginners
- More complex setup and configuration
- Abstracts away low-level details, which can hinder understanding
Code Comparison
ML-From-Scratch (Neural Network implementation):
class NeuralNetwork():
def __init__(self, n_hidden, n_features, n_output):
self.W1 = np.random.randn(n_features, n_hidden)
self.W2 = np.random.randn(n_hidden, n_output)
def forward(self, X):
self.z1 = np.dot(X, self.W1)
self.a1 = self.sigmoid(self.z1)
self.z2 = np.dot(self.a1, self.W2)
return self.sigmoid(self.z2)
TensorFlow (Neural Network implementation):
model = tf.keras.Sequential([
tf.keras.layers.Dense(n_hidden, activation='sigmoid', input_shape=(n_features,)),
tf.keras.layers.Dense(n_output, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=100, batch_size=32)
The ML-From-Scratch implementation provides a more transparent view of the neural network's inner workings, while TensorFlow offers a higher-level API that simplifies model creation and training at the cost of some abstraction.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Highly optimized and production-ready framework
- Extensive ecosystem with pre-built models and tools
- Dynamic computational graphs for flexible model development
Cons of PyTorch
- Steeper learning curve for beginners
- Larger codebase and more complex architecture
- Less focus on educational aspects of machine learning algorithms
Code Comparison
ML-From-Scratch (Neural Network implementation):
class NeuralNetwork():
def __init__(self, n_hidden, n_features, n_classes):
self.W = np.random.randn(n_hidden, n_features)
self.b = np.zeros((n_hidden, 1))
self.V = np.random.randn(n_classes, n_hidden)
self.c = np.zeros((n_classes, 1))
PyTorch (Neural Network implementation):
class NeuralNetwork(nn.Module):
def __init__(self, n_hidden, n_features, n_classes):
super().__init__()
self.hidden = nn.Linear(n_features, n_hidden)
self.output = nn.Linear(n_hidden, n_classes)
def forward(self, x):
x = F.relu(self.hidden(x))
return self.output(x)
ML-From-Scratch focuses on implementing algorithms from scratch, providing a clear understanding of the underlying mathematics. PyTorch, on the other hand, offers a high-level API with built-in optimizations, making it more suitable for large-scale projects and research. While ML-From-Scratch is excellent for learning, PyTorch is the go-to choice for professional development and cutting-edge machine learning applications.
Deep Learning for humans
Pros of Keras
- High-level API for easy and fast prototyping of neural networks
- Extensive documentation and large community support
- Seamless integration with TensorFlow backend
Cons of Keras
- Less flexibility for implementing custom algorithms
- Abstraction may hide some low-level details
Code Comparison
ML-From-Scratch implementation of a simple neural network:
class NeuralNetwork():
def __init__(self, layers):
self.layers = layers
self.parameters = self._initialize_parameters()
def _initialize_parameters(self):
# Parameter initialization code
Keras implementation of a similar neural network:
from keras.models import Sequential
from keras.layers import Dense
model = Sequential([
Dense(64, activation='relu', input_shape=(input_dim,)),
Dense(32, activation='relu'),
Dense(output_dim, activation='softmax')
])
ML-From-Scratch provides a more detailed, low-level implementation, allowing users to understand the inner workings of neural networks. Keras, on the other hand, offers a more concise and user-friendly approach, abstracting away many implementation details.
While ML-From-Scratch is excellent for learning and understanding machine learning algorithms, Keras is more suitable for practical, production-ready applications due to its efficiency and extensive ecosystem.
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Pros of ML-For-Beginners
- Comprehensive curriculum structure with lessons, assignments, and quizzes
- Covers a wide range of ML topics, including ethics and real-world applications
- Includes visualizations and interactive elements to enhance learning
Cons of ML-For-Beginners
- Less focus on implementing algorithms from scratch
- May not provide as deep an understanding of the mathematical foundations
- Primarily uses high-level libraries rather than building core components
Code Comparison
ML-From-Scratch (implementing linear regression):
class LinearRegression():
def fit(self, X, y):
X = np.insert(X, 0, 1, axis=1)
self.theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
def predict(self, X):
X = np.insert(X, 0, 1, axis=1)
return X.dot(self.theta)
ML-For-Beginners (using scikit-learn for linear regression):
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
ML-From-Scratch focuses on implementing algorithms from the ground up, providing a deeper understanding of the underlying mathematics. ML-For-Beginners offers a more structured learning experience with a broader coverage of ML topics, but relies more on existing libraries for implementation.
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Pros of handson-ml2
- Comprehensive coverage of machine learning concepts with practical examples
- Utilizes popular libraries like TensorFlow and Scikit-learn for real-world applications
- Includes Jupyter notebooks for interactive learning and experimentation
Cons of handson-ml2
- Less focus on implementing algorithms from scratch
- May not provide as deep an understanding of the underlying mathematics
Code Comparison
ML-From-Scratch (Neural Network implementation):
class NeuralNetwork():
def __init__(self, n_hidden, n_iterations=3000, learning_rate=0.01):
self.n_hidden = n_hidden
self.n_iterations = n_iterations
self.learning_rate = learning_rate
self.hidden_activation = Sigmoid()
self.output_activation = Sigmoid()
handson-ml2 (Neural Network using TensorFlow):
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu", input_shape=[8]),
keras.layers.Dense(30, activation="relu"),
keras.layers.Dense(1)
])
The ML-From-Scratch example implements a neural network class from scratch, while handson-ml2 uses Keras (part of TensorFlow) to create a neural network model with pre-built layers and functions.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Machine Learning From Scratch
About
Python implementations of some of the fundamental Machine Learning models and algorithms from scratch.
The purpose of this project is not to produce as optimized and computationally efficient algorithms as possible but rather to present the inner workings of them in a transparent and accessible way.
Table of Contents
Installation
$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install
Examples
Polynomial Regression
$ python mlfromscratch/examples/polynomial_regression.py
Figure: Training progress of a regularized polynomial regression model fitting
temperature data measured in Linköping, Sweden 2016.
Classification With CNN
$ python mlfromscratch/examples/convolutional_neural_network.py
+---------+
| ConvNet |
+---------+
Input Shape: (1, 8, 8)
+----------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+----------------------+------------+--------------+
| Conv2D | 160 | (16, 8, 8) |
| Activation (ReLU) | 0 | (16, 8, 8) |
| Dropout | 0 | (16, 8, 8) |
| BatchNormalization | 2048 | (16, 8, 8) |
| Conv2D | 4640 | (32, 8, 8) |
| Activation (ReLU) | 0 | (32, 8, 8) |
| Dropout | 0 | (32, 8, 8) |
| BatchNormalization | 4096 | (32, 8, 8) |
| Flatten | 0 | (2048,) |
| Dense | 524544 | (256,) |
| Activation (ReLU) | 0 | (256,) |
| Dropout | 0 | (256,) |
| BatchNormalization | 512 | (256,) |
| Dense | 2570 | (10,) |
| Activation (Softmax) | 0 | (10,) |
+----------------------+------------+--------------+
Total Parameters: 538570
Training: 100% [------------------------------------------------------------------------] Time: 0:01:55
Accuracy: 0.987465181058
Figure: Classification of the digit dataset using CNN.
Density-Based Clustering
$ python mlfromscratch/examples/dbscan.py
Figure: Clustering of the moons dataset using DBSCAN.
Generating Handwritten Digits
$ python mlfromscratch/unsupervised_learning/generative_adversarial_network.py
+-----------+
| Generator |
+-----------+
Input Shape: (100,)
+------------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense | 25856 | (256,) |
| Activation (LeakyReLU) | 0 | (256,) |
| BatchNormalization | 512 | (256,) |
| Dense | 131584 | (512,) |
| Activation (LeakyReLU) | 0 | (512,) |
| BatchNormalization | 1024 | (512,) |
| Dense | 525312 | (1024,) |
| Activation (LeakyReLU) | 0 | (1024,) |
| BatchNormalization | 2048 | (1024,) |
| Dense | 803600 | (784,) |
| Activation (TanH) | 0 | (784,) |
+------------------------+------------+--------------+
Total Parameters: 1489936
+---------------+
| Discriminator |
+---------------+
Input Shape: (784,)
+------------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense | 401920 | (512,) |
| Activation (LeakyReLU) | 0 | (512,) |
| Dropout | 0 | (512,) |
| Dense | 131328 | (256,) |
| Activation (LeakyReLU) | 0 | (256,) |
| Dropout | 0 | (256,) |
| Dense | 514 | (2,) |
| Activation (Softmax) | 0 | (2,) |
+------------------------+------------+--------------+
Total Parameters: 533762
Figure: Training progress of a Generative Adversarial Network generating
handwritten digits.
Deep Reinforcement Learning
$ python mlfromscratch/examples/deep_q_network.py
+----------------+
| Deep Q-Network |
+----------------+
Input Shape: (4,)
+-------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+-------------------+------------+--------------+
| Dense | 320 | (64,) |
| Activation (ReLU) | 0 | (64,) |
| Dense | 130 | (2,) |
+-------------------+------------+--------------+
Total Parameters: 450
Figure: Deep Q-Network solution to the CartPole-v1 environment in OpenAI gym.
Image Reconstruction With RBM
$ python mlfromscratch/examples/restricted_boltzmann_machine.py
Figure: Shows how the network gets better during training at reconstructing
the digit 2 in the MNIST dataset.
Evolutionary Evolved Neural Network
$ python mlfromscratch/examples/neuroevolution.py
+---------------+
| Model Summary |
+---------------+
Input Shape: (64,)
+----------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+----------------------+------------+--------------+
| Dense | 1040 | (16,) |
| Activation (ReLU) | 0 | (16,) |
| Dense | 170 | (10,) |
| Activation (Softmax) | 0 | (10,) |
+----------------------+------------+--------------+
Total Parameters: 1210
Population Size: 100
Generations: 3000
Mutation Rate: 0.01
[0 Best Individual - Fitness: 3.08301, Accuracy: 10.5%]
[1 Best Individual - Fitness: 3.08746, Accuracy: 12.0%]
...
[2999 Best Individual - Fitness: 94.08513, Accuracy: 98.5%]
Test set accuracy: 96.7%
Figure: Classification of the digit dataset by a neural network which has
been evolutionary evolved.
Genetic Algorithm
$ python mlfromscratch/examples/genetic_algorithm.py
+--------+
| GA |
+--------+
Description: Implementation of a Genetic Algorithm which aims to produce
the user specified target string. This implementation calculates each
candidate's fitness based on the alphabetical distance between the candidate
and the target. A candidate is selected as a parent with probabilities proportional
to the candidate's fitness. Reproduction is implemented as a single-point
crossover between pairs of parents. Mutation is done by randomly assigning
new characters with uniform probability.
Parameters
----------
Target String: 'Genetic Algorithm'
Population Size: 100
Mutation Rate: 0.05
[0 Closest Candidate: 'CJqlJguPlqzvpoJmb', Fitness: 0.00]
[1 Closest Candidate: 'MCxZxdr nlfiwwGEk', Fitness: 0.01]
[2 Closest Candidate: 'MCxZxdm nlfiwwGcx', Fitness: 0.01]
[3 Closest Candidate: 'SmdsAklMHn kBIwKn', Fitness: 0.01]
[4 Closest Candidate: ' lotneaJOasWfu Z', Fitness: 0.01]
...
[292 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[293 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[294 Answer: 'Genetic Algorithm']
Association Analysis
$ python mlfromscratch/examples/apriori.py
+-------------+
| Apriori |
+-------------+
Minimum Support: 0.25
Minimum Confidence: 0.8
Transactions:
[1, 2, 3, 4]
[1, 2, 4]
[1, 2]
[2, 3, 4]
[2, 3]
[3, 4]
[2, 4]
Frequent Itemsets:
[1, 2, 3, 4, [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [1, 2, 4], [2, 3, 4]]
Rules:
1 -> 2 (support: 0.43, confidence: 1.0)
4 -> 2 (support: 0.57, confidence: 0.8)
[1, 4] -> 2 (support: 0.29, confidence: 1.0)
Implementations
Supervised Learning
- Adaboost
- Bayesian Regression
- Decision Tree
- Elastic Net
- Gradient Boosting
- K Nearest Neighbors
- Lasso Regression
- Linear Discriminant Analysis
- Linear Regression
- Logistic Regression
- Multi-class Linear Discriminant Analysis
- Multilayer Perceptron
- Naive Bayes
- Neuroevolution
- Particle Swarm Optimization of Neural Network
- Perceptron
- Polynomial Regression
- Random Forest
- Ridge Regression
- Support Vector Machine
- XGBoost
Unsupervised Learning
- Apriori
- Autoencoder
- DBSCAN
- FP-Growth
- Gaussian Mixture Model
- Generative Adversarial Network
- Genetic Algorithm
- K-Means
- Partitioning Around Medoids
- Principal Component Analysis
- Restricted Boltzmann Machine
Reinforcement Learning
Deep Learning
- Neural Network
- Layers
- Activation Layer
- Average Pooling Layer
- Batch Normalization Layer
- Constant Padding Layer
- Convolutional Layer
- Dropout Layer
- Flatten Layer
- Fully-Connected (Dense) Layer
- Fully-Connected RNN Layer
- Max Pooling Layer
- Reshape Layer
- Up Sampling Layer
- Zero Padding Layer
- Model Types
Contact
If there's some implementation you would like to see here or if you're just feeling social, feel free to email me or connect with me on LinkedIn.
Top Related Projects
scikit-learn: machine learning in Python
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Deep Learning for humans
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot