Convert Figma logo to code with AI

scikit-learn logoscikit-learn

scikit-learn: machine learning in Python

60,480
25,467
60,480
2,105

Top Related Projects

186,879

An Open Source Machine Learning Framework for Everyone

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

16,597

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

26,184

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

62,199

Deep Learning for humans

Quick Overview

Scikit-learn is a popular open-source machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms, including support vector machines, random forests, gradient boosting, k-means, and more. Scikit-learn is designed to be efficient, scalable, and easy to use, making it a go-to choice for both beginners and experienced data scientists.

Pros

  • Comprehensive Algorithms: Scikit-learn provides a wide range of state-of-the-art machine learning algorithms, covering a diverse set of tasks and use cases.
  • Ease of Use: The library has a user-friendly API and excellent documentation, making it accessible for both novice and experienced users.
  • Performance: Scikit-learn is built on top of efficient numerical libraries like NumPy and SciPy, ensuring fast and scalable performance.
  • Active Community: The project has a large and active community of contributors, ensuring regular updates, bug fixes, and new feature additions.

Cons

  • Limited Deep Learning Support: While Scikit-learn is excellent for traditional machine learning tasks, it has limited support for deep learning compared to specialized libraries like TensorFlow or PyTorch.
  • Steep Learning Curve for Beginners: The breadth of algorithms and features in Scikit-learn can be overwhelming for beginners, requiring a significant investment in learning the library.
  • Lack of Interpretability: Some of the more complex models in Scikit-learn, such as random forests and gradient boosting, can be difficult to interpret, which can be a drawback in certain applications.
  • Dependency on Other Libraries: Scikit-learn relies on other scientific computing libraries like NumPy and SciPy, which can add complexity for users who are not familiar with the Python data science ecosystem.

Code Examples

Here are a few code examples demonstrating the usage of Scikit-learn:

  1. Classification with Support Vector Machines (SVM):
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Generate sample data
X, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
clf = SVC(kernel='rbf', C=1.0)
clf.fit(X_train, y_train)

# Evaluate the model on the test set
accuracy = clf.score(X_test, y_test)
print(f'Accuracy: {accuracy:.2f}')
  1. Clustering with K-Means:
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Generate sample data
X, y = make_blobs(n_samples=500, centers=4, n_features=2, random_state=42)

# Perform K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=42)
labels = kmeans.fit_predict(X)

# Visualize the clustering results
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='x', s=200, c='red')
plt.title('K-Means Clustering')
plt.show()
  1. Regression with Random Forest:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

# Generate sample data
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random

Competitor Comparisons

186,879

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • More powerful for deep learning and neural networks
  • Better support for distributed computing and GPU acceleration
  • Flexible ecosystem with tools like TensorBoard for visualization

Cons of TensorFlow

  • Steeper learning curve and more complex API
  • Slower for simple machine learning tasks
  • Larger library size and longer setup time

Code Comparison

TensorFlow example (neural network):

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')

scikit-learn example (random forest):

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

TensorFlow is better suited for complex deep learning tasks, while scikit-learn excels in traditional machine learning algorithms. TensorFlow offers more flexibility and power but requires more expertise, whereas scikit-learn provides a simpler, more intuitive API for quick prototyping and smaller-scale projects. The choice between the two depends on the specific requirements of your machine learning task and your level of expertise.

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • More flexible and dynamic computational graph
  • Better support for GPU acceleration and distributed computing
  • Easier to debug and understand due to its pythonic nature

Cons of PyTorch

  • Steeper learning curve for beginners
  • Smaller ecosystem of pre-built models and tools
  • Less suitable for traditional machine learning tasks

Code Comparison

scikit-learn:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

PyTorch:

import torch.nn as nn
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc = nn.Linear(input_size, output_size)
    def forward(self, x):
        return self.fc(x)

scikit-learn is more concise for traditional machine learning tasks, while PyTorch offers more flexibility for deep learning and custom model architectures. scikit-learn provides a higher-level API, making it easier for beginners and quick prototyping. PyTorch's lower-level API allows for more control over the model's internals and computation, which is beneficial for research and complex deep learning projects.

16,597

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Pros of LightGBM

  • Faster training speed and higher efficiency, especially for large datasets
  • Lower memory usage due to its histogram-based algorithm
  • Better accuracy in many scenarios, particularly for categorical features

Cons of LightGBM

  • Less extensive documentation and community support compared to scikit-learn
  • Steeper learning curve for beginners due to more hyperparameters
  • Not as versatile for general machine learning tasks beyond gradient boosting

Code Comparison

LightGBM:

import lightgbm as lgb
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

scikit-learn:

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Both libraries offer similar ease of use for basic implementation. However, LightGBM provides more advanced options for fine-tuning performance, while scikit-learn offers a wider range of algorithms and preprocessing tools within a single package.

26,184

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Pros of XGBoost

  • Faster training and prediction times for large datasets
  • Better handling of missing values and categorical features
  • Generally achieves higher accuracy on a wide range of problems

Cons of XGBoost

  • Less intuitive for beginners compared to scikit-learn's API
  • Requires more hyperparameter tuning to achieve optimal performance
  • Limited to tree-based models, while scikit-learn offers a broader range of algorithms

Code Comparison

XGBoost:

import xgboost as xgb
model = xgb.XGBClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

scikit-learn:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Both libraries offer similar high-level APIs for model training and prediction. However, XGBoost provides more advanced features and parameters for fine-tuning gradient boosting models, while scikit-learn offers a wider variety of algorithms and a more consistent API across different model types.

XGBoost is generally preferred for competitions and when maximum performance is required, while scikit-learn is often chosen for its ease of use, extensive documentation, and broader range of algorithms for various machine learning tasks.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Pros of CatBoost

  • Handles categorical features automatically without preprocessing
  • Generally faster training and prediction times, especially on GPU
  • Often achieves better performance out-of-the-box on datasets with categorical features

Cons of CatBoost

  • Less flexibility and customization options compared to scikit-learn
  • Smaller community and ecosystem of extensions/plugins
  • Limited to gradient boosting algorithms, while scikit-learn offers a wide range of ML algorithms

Code Comparison

scikit-learn:

from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

CatBoost:

from catboost import CatBoostClassifier
model = CatBoostClassifier(cat_features=cat_features)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

The main difference in usage is that CatBoost allows direct specification of categorical features, while scikit-learn requires preprocessing of categorical variables (e.g., one-hot encoding) before training. CatBoost's API is designed to be similar to scikit-learn for ease of use, but with some additional parameters specific to its implementation.

62,199

Deep Learning for humans

Pros of Keras

  • Higher-level API, making it easier to build and experiment with neural networks
  • Better suited for deep learning tasks and complex neural network architectures
  • Supports multiple backend engines (TensorFlow, Theano, CNTK)

Cons of Keras

  • Less flexible for non-neural network machine learning tasks
  • Slower execution compared to lower-level libraries
  • Steeper learning curve for understanding underlying concepts

Code Comparison

Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(64, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))

Scikit-learn:

from sklearn.neural_network import MLPClassifier

model = MLPClassifier(hidden_layer_sizes=(64,), activation='relu')
model.fit(X_train, y_train)

Summary

Keras is better suited for deep learning and complex neural network architectures, while Scikit-learn offers a broader range of machine learning algorithms and is more flexible for general-purpose tasks. Keras provides a higher-level API, making it easier to build and experiment with neural networks, but may have a steeper learning curve for understanding underlying concepts. Scikit-learn, on the other hand, is more intuitive for beginners and offers faster execution for simpler models.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

.. -- mode: rst --

|Azure| |CirrusCI| |Codecov| |CircleCI| |Nightly wheels| |Black| |PythonVersion| |PyPi| |DOI| |Benchmark|

.. |Azure| image:: https://dev.azure.com/scikit-learn/scikit-learn/_apis/build/status/scikit-learn.scikit-learn?branchName=main :target: https://dev.azure.com/scikit-learn/scikit-learn/_build/latest?definitionId=1&branchName=main

.. |CircleCI| image:: https://circleci.com/gh/scikit-learn/scikit-learn/tree/main.svg?style=shield :target: https://circleci.com/gh/scikit-learn/scikit-learn

.. |CirrusCI| image:: https://img.shields.io/cirrus/github/scikit-learn/scikit-learn/main?label=Cirrus%20CI :target: https://cirrus-ci.com/github/scikit-learn/scikit-learn/main

.. |Codecov| image:: https://codecov.io/gh/scikit-learn/scikit-learn/branch/main/graph/badge.svg?token=Pk8G9gg3y9 :target: https://codecov.io/gh/scikit-learn/scikit-learn

.. |Nightly wheels| image:: https://github.com/scikit-learn/scikit-learn/workflows/Wheel%20builder/badge.svg?event=schedule :target: https://github.com/scikit-learn/scikit-learn/actions?query=workflow%3A%22Wheel+builder%22+event%3Aschedule

.. |PythonVersion| image:: https://img.shields.io/pypi/pyversions/scikit-learn.svg :target: https://pypi.org/project/scikit-learn/

.. |PyPi| image:: https://img.shields.io/pypi/v/scikit-learn :target: https://pypi.org/project/scikit-learn

.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black

.. |DOI| image:: https://zenodo.org/badge/21369/scikit-learn/scikit-learn.svg :target: https://zenodo.org/badge/latestdoi/21369/scikit-learn/scikit-learn

.. |Benchmark| image:: https://img.shields.io/badge/Benchmarked%20by-asv-blue :target: https://scikit-learn.org/scikit-learn-benchmarks

.. |PythonMinVersion| replace:: 3.9 .. |NumPyMinVersion| replace:: 1.19.5 .. |SciPyMinVersion| replace:: 1.6.0 .. |JoblibMinVersion| replace:: 1.2.0 .. |ThreadpoolctlMinVersion| replace:: 3.1.0 .. |MatplotlibMinVersion| replace:: 3.3.4 .. |Scikit-ImageMinVersion| replace:: 0.17.2 .. |PandasMinVersion| replace:: 1.1.5 .. |SeabornMinVersion| replace:: 0.9.0 .. |PytestMinVersion| replace:: 7.1.2 .. |PlotlyMinVersion| replace:: 5.14.0

.. image:: https://raw.githubusercontent.com/scikit-learn/scikit-learn/main/doc/logos/scikit-learn-logo.png :target: https://scikit-learn.org/

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us <https://scikit-learn.org/dev/about.html#authors>__ page for a list of core contributors.

It is currently maintained by a team of volunteers.

Website: https://scikit-learn.org

Installation

Dependencies


scikit-learn requires:

- Python (>= |PythonMinVersion|)
- NumPy (>= |NumPyMinVersion|)
- SciPy (>= |SciPyMinVersion|)
- joblib (>= |JoblibMinVersion|)
- threadpoolctl (>= |ThreadpoolctlMinVersion|)

=======

**Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.**
scikit-learn 1.0 and later require Python 3.7 or newer.
scikit-learn 1.1 and later require Python 3.8 or newer.

Scikit-learn plotting capabilities (i.e., functions start with ``plot_`` and
classes end with ``Display``) require Matplotlib (>= |MatplotlibMinVersion|).
For running the examples Matplotlib >= |MatplotlibMinVersion| is required.
A few examples require scikit-image >= |Scikit-ImageMinVersion|, a few examples
require pandas >= |PandasMinVersion|, some examples require seaborn >=
|SeabornMinVersion| and plotly >= |PlotlyMinVersion|.

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install scikit-learn is using pip::

pip install -U scikit-learn

or conda::

conda install -c conda-forge scikit-learn

The documentation includes more detailed installation instructions <https://scikit-learn.org/stable/install.html>_.

Changelog

See the changelog <https://scikit-learn.org/dev/whats_new.html>__ for a history of notable changes to scikit-learn.

Development

We welcome new contributors of all experience levels. The scikit-learn community goals are to be helpful, welcoming, and effective. The Development Guide <https://scikit-learn.org/stable/developers/index.html>_ has detailed information about contributing code, documentation, tests, and more. We've included some basic information in this README.

Important links


- Official source code repo: https://github.com/scikit-learn/scikit-learn
- Download releases: https://pypi.org/project/scikit-learn/
- Issue tracker: https://github.com/scikit-learn/scikit-learn/issues

Source code
~~~~~~~~~~~

You can check the latest sources with the command::

    git clone https://github.com/scikit-learn/scikit-learn.git

Contributing
~~~~~~~~~~~~

To learn more about making a contribution to scikit-learn, please see our
`Contributing guide
<https://scikit-learn.org/dev/developers/contributing.html>`_.

Testing
~~~~~~~

After installation, you can launch the test suite from outside the source
directory (you will need to have ``pytest`` >= |PyTestMinVersion| installed)::

    pytest sklearn

See the web page https://scikit-learn.org/dev/developers/contributing.html#testing-and-improving-test-coverage
for more information.

    Random number generation can be controlled during testing by setting
    the ``SKLEARN_SEED`` environment variable.

Submitting a Pull Request

Before opening a Pull Request, have a look at the full Contributing page to make sure your code complies with our guidelines: https://scikit-learn.org/stable/developers/index.html

Project History

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us <https://scikit-learn.org/dev/about.html#authors>__ page for a list of core contributors.

The project is currently maintained by a team of volunteers.

Note: scikit-learn was previously referred to as scikits.learn.

Help and Support

Documentation


- HTML documentation (stable release): https://scikit-learn.org
- HTML documentation (development version): https://scikit-learn.org/dev/
- FAQ: https://scikit-learn.org/stable/faq.html

Communication

Citation


If you use scikit-learn in a scientific publication, we would appreciate citations: https://scikit-learn.org/stable/about.html#citing-scikit-learn