Top Related Projects
scikit-learn: machine learning in Python
Statsmodels: statistical modeling and econometrics in Python
A unified framework for machine learning with time series
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Quick Overview
tslearn is a Python library for time series analysis and machine learning. It provides tools for preprocessing, clustering, and classification of time series data, as well as implementations of various algorithms specifically designed for time series tasks. The library is built on top of scikit-learn, making it familiar and easy to use for those already acquainted with the scikit-learn ecosystem.
Pros
- Comprehensive set of tools for time series analysis and machine learning
- Compatible with scikit-learn, allowing for easy integration into existing workflows
- Efficient implementations of time series-specific algorithms
- Well-documented with examples and tutorials
Cons
- Limited support for multivariate time series
- Some advanced features may have a steeper learning curve
- Smaller community compared to more general-purpose machine learning libraries
- May require additional dependencies for certain functionalities
Code Examples
- Time series clustering using K-means:
from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets
# Load example dataset
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
# Perform K-means clustering
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = km.fit_predict(X_train)
- Dynamic Time Warping (DTW) distance calculation:
from tslearn.metrics import dtw
# Define two time series
ts1 = [1, 2, 3, 4, 5]
ts2 = [1, 1, 2, 3, 4, 5]
# Calculate DTW distance
distance = dtw(ts1, ts2)
print(f"DTW distance: {distance}")
- Time series classification using 1-Nearest Neighbor:
from tslearn.neighbors import KNeighborsTimeSeriesClassifier
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
# Scale the data
scaler = TimeSeriesScalerMeanVariance()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train and predict using 1-NN classifier
knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric="dtw")
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)
Getting Started
To get started with tslearn, follow these steps:
-
Install tslearn using pip:
pip install tslearn
-
Import the necessary modules:
from tslearn.datasets import CachedDatasets from tslearn.preprocessing import TimeSeriesScalerMeanVariance from tslearn.clustering import TimeSeriesKMeans
-
Load and preprocess a dataset:
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace") scaler = TimeSeriesScalerMeanVariance() X_train_scaled = scaler.fit_transform(X_train)
-
Perform a time series analysis task (e.g., clustering):
km = TimeSeriesKMeans(n_clusters=3, metric="dtw") labels = km.fit_predict(X_train_scaled)
Competitor Comparisons
scikit-learn: machine learning in Python
Pros of scikit-learn
- Comprehensive machine learning library with a wide range of algorithms and tools
- Large and active community, extensive documentation, and frequent updates
- Well-established and widely adopted in industry and academia
Cons of scikit-learn
- Not specifically designed for time series data, lacking specialized time series algorithms
- Can be complex for beginners due to its extensive feature set
- May require additional libraries for specific time series tasks
Code Comparison
scikit-learn (general-purpose machine learning):
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
tslearn (time series-specific):
from tslearn.clustering import TimeSeriesKMeans
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
km.fit(X_train)
tslearn is specifically designed for time series data, offering specialized algorithms and distance measures like Dynamic Time Warping (DTW). It provides a more focused toolset for time series analysis, making it easier to work with temporal data. However, scikit-learn offers a broader range of machine learning algorithms and is more versatile for general-purpose tasks. The choice between the two depends on the specific requirements of your project and the nature of your data.
Statsmodels: statistical modeling and econometrics in Python
Pros of statsmodels
- Broader scope, covering a wide range of statistical models and econometric tools
- More extensive documentation and user community
- Integrates well with other scientific Python libraries like NumPy and Pandas
Cons of statsmodels
- Steeper learning curve due to its comprehensive nature
- May be overkill for simple time series analysis tasks
- Less focused on machine learning-oriented time series tasks
Code Comparison
statsmodels:
import statsmodels.api as sm
model = sm.tsa.ARIMA(data, order=(1,1,1))
results = model.fit()
forecast = results.forecast(steps=5)
tslearn:
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.clustering import TimeSeriesKMeans
scaler = TimeSeriesScalerMeanVariance()
scaled_data = scaler.fit_transform(data)
kmeans = TimeSeriesKMeans(n_clusters=3, metric="dtw")
labels = kmeans.fit_predict(scaled_data)
Summary
statsmodels is a comprehensive statistical library with a broad scope, while tslearn focuses specifically on machine learning for time series data. statsmodels offers more traditional statistical models and econometric tools, making it suitable for a wide range of statistical analyses. tslearn, on the other hand, provides specialized algorithms for time series clustering, classification, and preprocessing, which may be more appropriate for specific machine learning tasks involving time series data.
A unified framework for machine learning with time series
Pros of sktime
- More comprehensive, covering a wider range of time series tasks including forecasting, classification, and regression
- Better integration with the scikit-learn ecosystem
- More active development and larger community support
Cons of sktime
- Steeper learning curve due to its more complex architecture
- Potentially slower execution for simpler time series tasks
- Less focus on specific time series clustering algorithms
Code Comparison
sktime example:
from sktime.datasets import load_airline
from sktime.forecasting.naive import NaiveForecaster
y = load_airline()
forecaster = NaiveForecaster(strategy="mean")
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1, 2, 3])
tslearn example:
from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
km.fit(X_train)
Both libraries offer powerful tools for time series analysis, but sktime provides a more comprehensive suite of algorithms and better integration with scikit-learn. tslearn, on the other hand, excels in specific areas like time series clustering and offers a simpler API for certain tasks.
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Pros of Prophet
- More user-friendly and accessible for non-experts
- Handles missing data and outliers automatically
- Provides built-in forecasting components (e.g., holidays, seasonality)
Cons of Prophet
- Less flexible for custom time series algorithms
- Limited to forecasting tasks, not general time series analysis
- May be slower for large datasets compared to tslearn
Code Comparison
Prophet:
from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)
tslearn:
from tslearn.clustering import TimeSeriesKMeans
model = TimeSeriesKMeans(n_clusters=3, metric="dtw")
model.fit(X)
labels = model.labels_
Prophet focuses on forecasting with a simple API, while tslearn offers a broader range of time series algorithms, including clustering as shown in the example. Prophet's code is more straightforward for forecasting tasks, while tslearn provides more flexibility for various time series analyses.
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Pros of pmdarima
- Specialized in time series forecasting, particularly ARIMA models
- Offers automated model selection and hyperparameter tuning
- Provides comprehensive documentation and examples
Cons of pmdarima
- More limited in scope compared to tslearn's broader time series toolkit
- Less suitable for general-purpose time series analysis tasks
- Smaller community and fewer contributors
Code Comparison
tslearn example:
from tslearn.clustering import TimeSeriesKMeans
kmeans = TimeSeriesKMeans(n_clusters=3, metric="dtw")
kmeans.fit(X_train)
pmdarima example:
from pmdarima import auto_arima
model = auto_arima(y, start_p=1, start_q=1, max_p=5, max_q=5)
forecasts = model.predict(n_periods=10)
tslearn focuses on various time series algorithms, including clustering, while pmdarima specializes in ARIMA modeling and forecasting. tslearn offers a broader range of tools for time series analysis, making it more versatile for different tasks. pmdarima, on the other hand, excels in automated ARIMA modeling, providing a more streamlined approach for specific forecasting needs.
Both libraries have their strengths, and the choice between them depends on the specific requirements of your time series analysis project.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
tslearn
The machine learning toolkit for time series analysis in Python
Section | Description |
---|---|
Installation | Installing the dependencies and tslearn |
Getting started | A quick introduction on how to use tslearn |
Available features | An extensive overview of tslearn's functionalities |
Documentation | A link to our API reference and a gallery of examples |
Contributing | A guide for heroes willing to contribute |
Citation | A citation for tslearn for scholarly articles |
Installation
There are different alternatives to install tslearn:
- PyPi:
python -m pip install tslearn
- Conda:
conda install -c conda-forge tslearn
- Git:
python -m pip install https://github.com/tslearn-team/tslearn/archive/main.zip
In order for the installation to be successful, the required dependencies must be installed. For a more detailed guide on how to install tslearn, please see the Documentation.
Getting started
1. Getting the data in the right format
tslearn expects a time series dataset to be formatted as a 3D numpy
array. The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d
). In order to get the data in the right format, different solutions exist:
- You can use the utility functions such as
to_time_series_dataset
. - You can convert from other popular time series toolkits in Python.
- You can load any of the UCR datasets in the required format.
- You can generate synthetic data using the
generators
module.
It should further be noted that tslearn supports variable-length timeseries.
>>> from tslearn.utils import to_time_series_dataset
>>> my_first_time_series = [1, 3, 4, 2]
>>> my_second_time_series = [1, 2, 4, 2]
>>> my_third_time_series = [1, 2, 4, 2, 2]
>>> X = to_time_series_dataset([my_first_time_series,
my_second_time_series,
my_third_time_series])
>>> y = [0, 1, 1]
2. Data preprocessing and transformations
Optionally, tslearn has several utilities to preprocess the data. In order to facilitate the convergence of different algorithms, you can scale time series. Alternatively, in order to speed up training times, one can resample the data or apply a piece-wise transformation.
>>> from tslearn.preprocessing import TimeSeriesScalerMinMax
>>> X_scaled = TimeSeriesScalerMinMax().fit_transform(X)
>>> print(X_scaled)
[[[0.] [0.667] [1.] [0.333] [nan]]
[[0.] [0.333] [1.] [0.333] [nan]]
[[0.] [0.333] [1.] [0.333] [0.333]]]
3. Training a model
After getting the data in the right format, a model can be trained. Depending on the use case, tslearn supports different tasks: classification, clustering and regression. For an extensive overview of possibilities, check out our gallery of examples.
>>> from tslearn.neighbors import KNeighborsTimeSeriesClassifier
>>> knn = KNeighborsTimeSeriesClassifier(n_neighbors=1)
>>> knn.fit(X_scaled, y)
>>> print(knn.predict(X_scaled))
[0 1 1]
As can be seen, the models in tslearn follow the same API as those of the well-known scikit-learn. Moreover, they are fully compatible with it, allowing to use different scikit-learn utilities such as hyper-parameter tuning and pipelines.
4. More analyses
tslearn further allows to perform all different types of analysis. Examples include calculating barycenters of a group of time series or calculate the distances between time series using a variety of distance metrics.
Available features
data | processing | clustering | classification | regression | metrics |
---|---|---|---|---|---|
UCR Datasets | Scaling | TimeSeriesKMeans | KNN Classifier | KNN Regressor | Dynamic Time Warping |
Generators | Piecewise | KShape | TimeSeriesSVC | TimeSeriesSVR | Global Alignment Kernel |
Conversion(1, 2) | KernelKmeans | LearningShapelets | MLP | Barycenters | |
Early Classification | Matrix Profile |
Documentation
The documentation is hosted at readthedocs. It includes an API, gallery of examples and a user guide.
Contributing
If you would like to contribute to tslearn
, please have a look at our contribution guidelines. A list of interesting TODO's can be found here. If you want other ML methods for time series to be added to this TODO list, do not hesitate to open an issue!
Referencing tslearn
If you use tslearn
in a scientific publication, we would appreciate citations:
@article{JMLR:v21:20-091,
author = {Romain Tavenard and Johann Faouzi and Gilles Vandewiele and
Felix Divo and Guillaume Androz and Chester Holtz and
Marie Payne and Roman Yurchak and Marc Ru{\ss}wurm and
Kushal Kolar and Eli Woods},
title = {Tslearn, A Machine Learning Toolkit for Time Series Data},
journal = {Journal of Machine Learning Research},
year = {2020},
volume = {21},
number = {118},
pages = {1-6},
url = {http://jmlr.org/papers/v21/20-091.html}
}
Acknowledgments
Authors would like to thank Mathieu Blondel for providing code for Kernel k-means and Soft-DTW, and to Mehran Maghoumi for his torch
-compatible implementation of SoftDTW.
Top Related Projects
scikit-learn: machine learning in Python
Statsmodels: statistical modeling and econometrics in Python
A unified framework for machine learning with time series
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot