tslearn

The machine learning toolkit for time series analysis in Python

2,994

350

2,994

145

View on GitHub

Top Related Projects

scikit-learn

62,466

scikit-learn: machine learning in Python

statsmodels

10,845

Statsmodels: statistical modeling and econometrics in Python

sktime

9,178

A unified framework for machine learning with time series

prophet

19,467

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

pmdarima

1,662

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Quick Overview

tslearn is a Python library for time series analysis and machine learning. It provides tools for preprocessing, clustering, and classification of time series data, as well as implementations of various algorithms specifically designed for time series tasks. The library is built on top of scikit-learn, making it familiar and easy to use for those already acquainted with the scikit-learn ecosystem.

Pros

Comprehensive set of tools for time series analysis and machine learning
Compatible with scikit-learn, allowing for easy integration into existing workflows
Efficient implementations of time series-specific algorithms
Well-documented with examples and tutorials

Cons

Limited support for multivariate time series
Some advanced features may have a steeper learning curve
Smaller community compared to more general-purpose machine learning libraries
May require additional dependencies for certain functionalities

Code Examples

Time series clustering using K-means:

from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets

# Load example dataset
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")

# Perform K-means clustering
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = km.fit_predict(X_train)

Dynamic Time Warping (DTW) distance calculation:

from tslearn.metrics import dtw

# Define two time series
ts1 = [1, 2, 3, 4, 5]
ts2 = [1, 1, 2, 3, 4, 5]

# Calculate DTW distance
distance = dtw(ts1, ts2)
print(f"DTW distance: {distance}")

Time series classification using 1-Nearest Neighbor:

from tslearn.neighbors import KNeighborsTimeSeriesClassifier
from tslearn.preprocessing import TimeSeriesScalerMeanVariance

# Scale the data
scaler = TimeSeriesScalerMeanVariance()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train and predict using 1-NN classifier
knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric="dtw")
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)

Getting Started

To get started with tslearn, follow these steps:

Install tslearn using pip:
```
pip install tslearn
```

Import the necessary modules:

from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.clustering import TimeSeriesKMeans

Load and preprocess a dataset:

X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
scaler = TimeSeriesScalerMeanVariance()
X_train_scaled = scaler.fit_transform(X_train)

Perform a time series analysis task (e.g., clustering):

km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = km.fit_predict(X_train_scaled)

Competitor Comparisons

scikit-learn

62,466

scikit-learn: machine learning in Python

Pros of scikit-learn

Comprehensive machine learning library with a wide range of algorithms and tools
Large and active community, extensive documentation, and frequent updates
Well-established and widely adopted in industry and academia

Cons of scikit-learn

Not specifically designed for time series data, lacking specialized time series algorithms
Can be complex for beginners due to its extensive feature set
May require additional libraries for specific time series tasks

Code Comparison

scikit-learn (general-purpose machine learning):

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

tslearn (time series-specific):

from tslearn.clustering import TimeSeriesKMeans

km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
km.fit(X_train)

tslearn is specifically designed for time series data, offering specialized algorithms and distance measures like Dynamic Time Warping (DTW). It provides a more focused toolset for time series analysis, making it easier to work with temporal data. However, scikit-learn offers a broader range of machine learning algorithms and is more versatile for general-purpose tasks. The choice between the two depends on the specific requirements of your project and the nature of your data.

statsmodels

10,845

Statsmodels: statistical modeling and econometrics in Python

Pros of statsmodels

Broader scope, covering a wide range of statistical models and econometric tools
More extensive documentation and user community
Integrates well with other scientific Python libraries like NumPy and Pandas

Cons of statsmodels

Steeper learning curve due to its comprehensive nature
May be overkill for simple time series analysis tasks
Less focused on machine learning-oriented time series tasks

Code Comparison

statsmodels:

import statsmodels.api as sm
model = sm.tsa.ARIMA(data, order=(1,1,1))
results = model.fit()
forecast = results.forecast(steps=5)

tslearn:

from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.clustering import TimeSeriesKMeans
scaler = TimeSeriesScalerMeanVariance()
scaled_data = scaler.fit_transform(data)
kmeans = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = kmeans.fit_predict(scaled_data)

Summary

statsmodels is a comprehensive statistical library with a broad scope, while tslearn focuses specifically on machine learning for time series data. statsmodels offers more traditional statistical models and econometric tools, making it suitable for a wide range of statistical analyses. tslearn, on the other hand, provides specialized algorithms for time series clustering, classification, and preprocessing, which may be more appropriate for specific machine learning tasks involving time series data.

sktime

9,178

A unified framework for machine learning with time series

Pros of sktime

More comprehensive, covering a wider range of time series tasks including forecasting, classification, and regression
Better integration with the scikit-learn ecosystem
More active development and larger community support

Cons of sktime

Steeper learning curve due to its more complex architecture
Potentially slower execution for simpler time series tasks
Less focus on specific time series clustering algorithms

Code Comparison

sktime example:

from sktime.datasets import load_airline
from sktime.forecasting.naive import NaiveForecaster

y = load_airline()
forecaster = NaiveForecaster(strategy="mean")
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1, 2, 3])

tslearn example:

from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets

X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
km.fit(X_train)

Both libraries offer powerful tools for time series analysis, but sktime provides a more comprehensive suite of algorithms and better integration with scikit-learn. tslearn, on the other hand, excels in specific areas like time series clustering and offers a simpler API for certain tasks.

prophet

19,467

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Pros of Prophet

More user-friendly and accessible for non-experts
Handles missing data and outliers automatically
Provides built-in forecasting components (e.g., holidays, seasonality)

Cons of Prophet

Less flexible for custom time series algorithms
Limited to forecasting tasks, not general time series analysis
May be slower for large datasets compared to tslearn

Code Comparison

Prophet:

from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

tslearn:

from tslearn.clustering import TimeSeriesKMeans
model = TimeSeriesKMeans(n_clusters=3, metric="dtw")
model.fit(X)
labels = model.labels_

Prophet focuses on forecasting with a simple API, while tslearn offers a broader range of time series algorithms, including clustering as shown in the example. Prophet's code is more straightforward for forecasting tasks, while tslearn provides more flexibility for various time series analyses.

pmdarima

1,662

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Pros of pmdarima

Specialized in time series forecasting, particularly ARIMA models
Offers automated model selection and hyperparameter tuning
Provides comprehensive documentation and examples

Cons of pmdarima

More limited in scope compared to tslearn's broader time series toolkit
Less suitable for general-purpose time series analysis tasks
Smaller community and fewer contributors

Code Comparison

tslearn example:

from tslearn.clustering import TimeSeriesKMeans
kmeans = TimeSeriesKMeans(n_clusters=3, metric="dtw")
kmeans.fit(X_train)

pmdarima example:

from pmdarima import auto_arima
model = auto_arima(y, start_p=1, start_q=1, max_p=5, max_q=5)
forecasts = model.predict(n_periods=10)

tslearn focuses on various time series algorithms, including clustering, while pmdarima specializes in ARIMA modeling and forecasting. tslearn offers a broader range of tools for time series analysis, making it more versatile for different tasks. pmdarima, on the other hand, excels in automated ARIMA modeling, providing a more streamlined approach for specific forecasting needs.

Both libraries have their strengths, and the choice between them depends on the specific requirements of your time series analysis project.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

tslearn

The machine learning toolkit for time series analysis in Python

Section	Description
Installation	Installing the dependencies and tslearn
Getting started	A quick introduction on how to use tslearn
Available features	An extensive overview of tslearn's functionalities
Documentation	A link to our API reference and a gallery of examples
Contributing	A guide for heroes willing to contribute
Citation	A citation for tslearn for scholarly articles

Installation

There are different alternatives to install tslearn:

PyPi: python -m pip install tslearn
Conda: conda install -c conda-forge tslearn
Git: python -m pip install https://github.com/tslearn-team/tslearn/archive/main.zip

In order for the installation to be successful, the required dependencies must be installed. For a more detailed guide on how to install tslearn, please see the Documentation.

Getting started

1. Getting the data in the right format

tslearn expects a time series dataset to be formatted as a 3D numpy array. The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d). In order to get the data in the right format, different solutions exist:

It should further be noted that tslearn supports variable-length timeseries.

>>> from tslearn.utils import to_time_series_dataset
>>> my_first_time_series = [1, 3, 4, 2]
>>> my_second_time_series = [1, 2, 4, 2]
>>> my_third_time_series = [1, 2, 4, 2, 2]
>>> X = to_time_series_dataset([my_first_time_series,
                                my_second_time_series,
                                my_third_time_series])
>>> y = [0, 1, 1]

2. Data preprocessing and transformations

Optionally, tslearn has several utilities to preprocess the data. In order to facilitate the convergence of different algorithms, you can scale time series. Alternatively, in order to speed up training times, one can resample the data or apply a piece-wise transformation.

>>> from tslearn.preprocessing import TimeSeriesScalerMinMax
>>> X_scaled = TimeSeriesScalerMinMax().fit_transform(X)
>>> print(X_scaled)
[[[0.] [0.667] [1.] [0.333] [nan]]
 [[0.] [0.333] [1.] [0.333] [nan]]
 [[0.] [0.333] [1.] [0.333] [0.333]]]

3. Training a model

After getting the data in the right format, a model can be trained. Depending on the use case, tslearn supports different tasks: classification, clustering and regression. For an extensive overview of possibilities, check out our gallery of examples.

>>> from tslearn.neighbors import KNeighborsTimeSeriesClassifier
>>> knn = KNeighborsTimeSeriesClassifier(n_neighbors=1)
>>> knn.fit(X_scaled, y)
>>> print(knn.predict(X_scaled))
[0 1 1]

As can be seen, the models in tslearn follow the same API as those of the well-known scikit-learn. Moreover, they are fully compatible with it, allowing to use different scikit-learn utilities such as hyper-parameter tuning and pipelines.

4. More analyses

tslearn further allows to perform all different types of analysis. Examples include calculating barycenters of a group of time series or calculate the distances between time series using a variety of distance metrics.

Available features

data	processing	clustering	classification	regression	metrics
UCR Datasets	Scaling	TimeSeriesKMeans	KNN Classifier	KNN Regressor	Dynamic Time Warping
Generators	Piecewise	KShape	TimeSeriesSVC	TimeSeriesSVR	Global Alignment Kernel
Conversion(1, 2)		KernelKmeans	LearningShapelets	MLP	Barycenters
			Early Classification		Matrix Profile

Documentation

The documentation is hosted at readthedocs. It includes an API, gallery of examples and a user guide.

Contributing

If you would like to contribute to tslearn, please have a look at our contribution guidelines. A list of interesting TODO's can be found here. If you want other ML methods for time series to be added to this TODO list, do not hesitate to open an issue!

Referencing tslearn

If you use tslearn in a scientific publication, we would appreciate citations:

@article{JMLR:v21:20-091,
  author  = {Romain Tavenard and Johann Faouzi and Gilles Vandewiele and 
             Felix Divo and Guillaume Androz and Chester Holtz and 
             Marie Payne and Roman Yurchak and Marc Ru{\ss}wurm and 
             Kushal Kolar and Eli Woods},
  title   = {Tslearn, A Machine Learning Toolkit for Time Series Data},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {118},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-091.html}
}

Acknowledgments

Authors would like to thank Mathieu Blondel for providing code for Kernel k-means and Soft-DTW, and to Mehran Maghoumi for his torch-compatible implementation of SoftDTW.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot