Convert Figma logo to code with AI

tslearn-team logotslearn

The machine learning toolkit for time series analysis in Python

2,924
343
2,924
139

Top Related Projects

scikit-learn: machine learning in Python

Statsmodels: statistical modeling and econometrics in Python

7,833

A unified framework for machine learning with time series

18,363

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Quick Overview

tslearn is a Python library for time series analysis and machine learning. It provides tools for preprocessing, clustering, and classification of time series data, as well as implementations of various algorithms specifically designed for time series tasks. The library is built on top of scikit-learn, making it familiar and easy to use for those already acquainted with the scikit-learn ecosystem.

Pros

  • Comprehensive set of tools for time series analysis and machine learning
  • Compatible with scikit-learn, allowing for easy integration into existing workflows
  • Efficient implementations of time series-specific algorithms
  • Well-documented with examples and tutorials

Cons

  • Limited support for multivariate time series
  • Some advanced features may have a steeper learning curve
  • Smaller community compared to more general-purpose machine learning libraries
  • May require additional dependencies for certain functionalities

Code Examples

  1. Time series clustering using K-means:
from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets

# Load example dataset
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")

# Perform K-means clustering
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = km.fit_predict(X_train)
  1. Dynamic Time Warping (DTW) distance calculation:
from tslearn.metrics import dtw

# Define two time series
ts1 = [1, 2, 3, 4, 5]
ts2 = [1, 1, 2, 3, 4, 5]

# Calculate DTW distance
distance = dtw(ts1, ts2)
print(f"DTW distance: {distance}")
  1. Time series classification using 1-Nearest Neighbor:
from tslearn.neighbors import KNeighborsTimeSeriesClassifier
from tslearn.preprocessing import TimeSeriesScalerMeanVariance

# Scale the data
scaler = TimeSeriesScalerMeanVariance()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train and predict using 1-NN classifier
knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric="dtw")
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)

Getting Started

To get started with tslearn, follow these steps:

  1. Install tslearn using pip:

    pip install tslearn
    
  2. Import the necessary modules:

    from tslearn.datasets import CachedDatasets
    from tslearn.preprocessing import TimeSeriesScalerMeanVariance
    from tslearn.clustering import TimeSeriesKMeans
    
  3. Load and preprocess a dataset:

    X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
    scaler = TimeSeriesScalerMeanVariance()
    X_train_scaled = scaler.fit_transform(X_train)
    
  4. Perform a time series analysis task (e.g., clustering):

    km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
    labels = km.fit_predict(X_train_scaled)
    

Competitor Comparisons

scikit-learn: machine learning in Python

Pros of scikit-learn

  • Comprehensive machine learning library with a wide range of algorithms and tools
  • Large and active community, extensive documentation, and frequent updates
  • Well-established and widely adopted in industry and academia

Cons of scikit-learn

  • Not specifically designed for time series data, lacking specialized time series algorithms
  • Can be complex for beginners due to its extensive feature set
  • May require additional libraries for specific time series tasks

Code Comparison

scikit-learn (general-purpose machine learning):

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

tslearn (time series-specific):

from tslearn.clustering import TimeSeriesKMeans

km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
km.fit(X_train)

tslearn is specifically designed for time series data, offering specialized algorithms and distance measures like Dynamic Time Warping (DTW). It provides a more focused toolset for time series analysis, making it easier to work with temporal data. However, scikit-learn offers a broader range of machine learning algorithms and is more versatile for general-purpose tasks. The choice between the two depends on the specific requirements of your project and the nature of your data.

Statsmodels: statistical modeling and econometrics in Python

Pros of statsmodels

  • Broader scope, covering a wide range of statistical models and econometric tools
  • More extensive documentation and user community
  • Integrates well with other scientific Python libraries like NumPy and Pandas

Cons of statsmodels

  • Steeper learning curve due to its comprehensive nature
  • May be overkill for simple time series analysis tasks
  • Less focused on machine learning-oriented time series tasks

Code Comparison

statsmodels:

import statsmodels.api as sm
model = sm.tsa.ARIMA(data, order=(1,1,1))
results = model.fit()
forecast = results.forecast(steps=5)

tslearn:

from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.clustering import TimeSeriesKMeans
scaler = TimeSeriesScalerMeanVariance()
scaled_data = scaler.fit_transform(data)
kmeans = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = kmeans.fit_predict(scaled_data)

Summary

statsmodels is a comprehensive statistical library with a broad scope, while tslearn focuses specifically on machine learning for time series data. statsmodels offers more traditional statistical models and econometric tools, making it suitable for a wide range of statistical analyses. tslearn, on the other hand, provides specialized algorithms for time series clustering, classification, and preprocessing, which may be more appropriate for specific machine learning tasks involving time series data.

7,833

A unified framework for machine learning with time series

Pros of sktime

  • More comprehensive, covering a wider range of time series tasks including forecasting, classification, and regression
  • Better integration with the scikit-learn ecosystem
  • More active development and larger community support

Cons of sktime

  • Steeper learning curve due to its more complex architecture
  • Potentially slower execution for simpler time series tasks
  • Less focus on specific time series clustering algorithms

Code Comparison

sktime example:

from sktime.datasets import load_airline
from sktime.forecasting.naive import NaiveForecaster

y = load_airline()
forecaster = NaiveForecaster(strategy="mean")
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1, 2, 3])

tslearn example:

from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets

X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
km.fit(X_train)

Both libraries offer powerful tools for time series analysis, but sktime provides a more comprehensive suite of algorithms and better integration with scikit-learn. tslearn, on the other hand, excels in specific areas like time series clustering and offers a simpler API for certain tasks.

18,363

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Pros of Prophet

  • More user-friendly and accessible for non-experts
  • Handles missing data and outliers automatically
  • Provides built-in forecasting components (e.g., holidays, seasonality)

Cons of Prophet

  • Less flexible for custom time series algorithms
  • Limited to forecasting tasks, not general time series analysis
  • May be slower for large datasets compared to tslearn

Code Comparison

Prophet:

from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

tslearn:

from tslearn.clustering import TimeSeriesKMeans
model = TimeSeriesKMeans(n_clusters=3, metric="dtw")
model.fit(X)
labels = model.labels_

Prophet focuses on forecasting with a simple API, while tslearn offers a broader range of time series algorithms, including clustering as shown in the example. Prophet's code is more straightforward for forecasting tasks, while tslearn provides more flexibility for various time series analyses.

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Pros of pmdarima

  • Specialized in time series forecasting, particularly ARIMA models
  • Offers automated model selection and hyperparameter tuning
  • Provides comprehensive documentation and examples

Cons of pmdarima

  • More limited in scope compared to tslearn's broader time series toolkit
  • Less suitable for general-purpose time series analysis tasks
  • Smaller community and fewer contributors

Code Comparison

tslearn example:

from tslearn.clustering import TimeSeriesKMeans
kmeans = TimeSeriesKMeans(n_clusters=3, metric="dtw")
kmeans.fit(X_train)

pmdarima example:

from pmdarima import auto_arima
model = auto_arima(y, start_p=1, start_q=1, max_p=5, max_q=5)
forecasts = model.predict(n_periods=10)

tslearn focuses on various time series algorithms, including clustering, while pmdarima specializes in ARIMA modeling and forecasting. tslearn offers a broader range of tools for time series analysis, making it more versatile for different tasks. pmdarima, on the other hand, excels in automated ARIMA modeling, providing a more streamlined approach for specific forecasting needs.

Both libraries have their strengths, and the choice between them depends on the specific requirements of your time series analysis project.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

tslearn

The machine learning toolkit for time series analysis in Python

PyPI Documentation Build (Azure Pipelines) Codecov Downloads


SectionDescription
InstallationInstalling the dependencies and tslearn
Getting startedA quick introduction on how to use tslearn
Available featuresAn extensive overview of tslearn's functionalities
DocumentationA link to our API reference and a gallery of examples
ContributingA guide for heroes willing to contribute
CitationA citation for tslearn for scholarly articles

Installation

There are different alternatives to install tslearn:

  • PyPi: python -m pip install tslearn
  • Conda: conda install -c conda-forge tslearn
  • Git: python -m pip install https://github.com/tslearn-team/tslearn/archive/main.zip

In order for the installation to be successful, the required dependencies must be installed. For a more detailed guide on how to install tslearn, please see the Documentation.

Getting started

1. Getting the data in the right format

tslearn expects a time series dataset to be formatted as a 3D numpy array. The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d). In order to get the data in the right format, different solutions exist:

It should further be noted that tslearn supports variable-length timeseries.

>>> from tslearn.utils import to_time_series_dataset
>>> my_first_time_series = [1, 3, 4, 2]
>>> my_second_time_series = [1, 2, 4, 2]
>>> my_third_time_series = [1, 2, 4, 2, 2]
>>> X = to_time_series_dataset([my_first_time_series,
                                my_second_time_series,
                                my_third_time_series])
>>> y = [0, 1, 1]

2. Data preprocessing and transformations

Optionally, tslearn has several utilities to preprocess the data. In order to facilitate the convergence of different algorithms, you can scale time series. Alternatively, in order to speed up training times, one can resample the data or apply a piece-wise transformation.

>>> from tslearn.preprocessing import TimeSeriesScalerMinMax
>>> X_scaled = TimeSeriesScalerMinMax().fit_transform(X)
>>> print(X_scaled)
[[[0.] [0.667] [1.] [0.333] [nan]]
 [[0.] [0.333] [1.] [0.333] [nan]]
 [[0.] [0.333] [1.] [0.333] [0.333]]]

3. Training a model

After getting the data in the right format, a model can be trained. Depending on the use case, tslearn supports different tasks: classification, clustering and regression. For an extensive overview of possibilities, check out our gallery of examples.

>>> from tslearn.neighbors import KNeighborsTimeSeriesClassifier
>>> knn = KNeighborsTimeSeriesClassifier(n_neighbors=1)
>>> knn.fit(X_scaled, y)
>>> print(knn.predict(X_scaled))
[0 1 1]

As can be seen, the models in tslearn follow the same API as those of the well-known scikit-learn. Moreover, they are fully compatible with it, allowing to use different scikit-learn utilities such as hyper-parameter tuning and pipelines.

4. More analyses

tslearn further allows to perform all different types of analysis. Examples include calculating barycenters of a group of time series or calculate the distances between time series using a variety of distance metrics.

Available features

dataprocessingclusteringclassificationregressionmetrics
UCR DatasetsScalingTimeSeriesKMeansKNN ClassifierKNN RegressorDynamic Time Warping
GeneratorsPiecewiseKShapeTimeSeriesSVCTimeSeriesSVRGlobal Alignment Kernel
Conversion(1, 2)KernelKmeansLearningShapeletsMLPBarycenters
Early ClassificationMatrix Profile

Documentation

The documentation is hosted at readthedocs. It includes an API, gallery of examples and a user guide.

Contributing

If you would like to contribute to tslearn, please have a look at our contribution guidelines. A list of interesting TODO's can be found here. If you want other ML methods for time series to be added to this TODO list, do not hesitate to open an issue!

Referencing tslearn

If you use tslearn in a scientific publication, we would appreciate citations:

@article{JMLR:v21:20-091,
  author  = {Romain Tavenard and Johann Faouzi and Gilles Vandewiele and 
             Felix Divo and Guillaume Androz and Chester Holtz and 
             Marie Payne and Roman Yurchak and Marc Ru{\ss}wurm and 
             Kushal Kolar and Eli Woods},
  title   = {Tslearn, A Machine Learning Toolkit for Time Series Data},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {118},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-091.html}
}

Acknowledgments

Authors would like to thank Mathieu Blondel for providing code for Kernel k-means and Soft-DTW, and to Mehran Maghoumi for his torch-compatible implementation of SoftDTW.