Top Related Projects
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
A unified framework for machine learning with time series
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Open source time series library for Python
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
A flexible, intuitive and fast forecasting library
Quick Overview
Statsforecast is a Python library for time series forecasting that focuses on statistical and econometric models. It provides fast and accurate implementations of popular forecasting algorithms, including ARIMA, ETS, and various other statistical methods. The library is designed to be efficient, scalable, and easy to use for both beginners and advanced users.
Pros
- High performance: Utilizes Rust and NumPy for fast computations, making it suitable for large-scale forecasting tasks
- Wide range of models: Offers a variety of statistical and econometric forecasting methods
- Easy integration: Compatible with popular data science libraries like pandas and scikit-learn
- Automatic model selection: Includes features for automatic model selection and hyperparameter tuning
Cons
- Limited to statistical models: Does not include machine learning or deep learning-based forecasting methods
- Steeper learning curve: May require more statistical knowledge compared to some other forecasting libraries
- Less extensive documentation: While improving, the documentation may not be as comprehensive as more established libraries
Code Examples
- Basic forecasting with AutoARIMA:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
model = StatsForecast(models=[AutoARIMA()], freq='D')
model.fit(df)
forecast = model.forecast(h=30)
- Using multiple models for ensemble forecasting:
from statsforecast.models import AutoARIMA, ETS, Naive
models = [AutoARIMA(), ETS(), Naive()]
sf = StatsForecast(models=models, freq='D')
sf.fit(df)
forecast = sf.forecast(h=30)
- Cross-validation for model evaluation:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
model = StatsForecast(models=[AutoARIMA()], freq='D')
cv_results = model.cross_validation(df, h=30, step_size=1, n_windows=5)
Getting Started
To get started with Statsforecast, follow these steps:
- Install the library:
pip install statsforecast
- Import the necessary modules and create a sample dataset:
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
# Create a sample dataset
dates = pd.date_range(start='2020-01-01', end='2022-12-31', freq='D')
values = np.random.randn(len(dates)).cumsum()
df = pd.DataFrame({'ds': dates, 'y': values})
- Fit a model and generate forecasts:
model = StatsForecast(models=[AutoARIMA()], freq='D')
model.fit(df)
forecast = model.forecast(h=30)
print(forecast)
Competitor Comparisons
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Pros of Prophet
- User-friendly interface with automatic handling of seasonality and holidays
- Robust handling of missing data and outliers
- Extensive documentation and community support
Cons of Prophet
- Can be slower for large datasets or many time series
- Less flexibility for custom models or advanced statistical techniques
- May overfit on datasets with limited historical data
Code Comparison
Prophet:
from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)
StatsForecasts:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
fcst = StatsForecast(df, models=[AutoARIMA()], freq='D')
forecast = fcst.forecast(h=365)
Key Differences
- Prophet focuses on an additive model with intuitive parameters, while StatsForecasts offers a variety of statistical models
- StatsForecasts is generally faster, especially for multiple time series
- Prophet provides built-in plotting and diagnostics, whereas StatsForecasts relies more on external visualization tools
- StatsForecasts offers more advanced statistical models and the ability to easily combine multiple forecasting methods
Both libraries have their strengths, with Prophet excelling in ease of use and interpretability, while StatsForecasts offers more flexibility and performance for advanced users and large-scale forecasting tasks.
A unified framework for machine learning with time series
Pros of sktime
- Broader scope, covering various time series tasks beyond forecasting
- Extensive ecosystem with many algorithms and transformers
- Strong integration with scikit-learn and pandas
Cons of sktime
- Steeper learning curve due to its comprehensive nature
- Potentially slower performance for some forecasting tasks
- Less focus on probabilistic forecasting compared to StatsForecasts
Code Comparison
sktime example:
from sktime.forecasting.arima import ARIMA
from sktime.datasets import load_airline
y = load_airline()
forecaster = ARIMA(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1, 2, 3])
StatsForecasts example:
from statsforecast import StatsForecast
from statsforecast.models import ARIMA
sf = StatsForecast(models=[ARIMA(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))])
sf.fit(df)
forecasts = sf.predict(h=3)
Both libraries offer ARIMA forecasting, but sktime provides a more scikit-learn-like API, while StatsForecasts focuses on simplicity and performance for forecasting tasks.
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Pros of pmdarima
- More mature project with a longer history and larger user base
- Extensive documentation and examples for various use cases
- Supports a wider range of ARIMA-based models, including SARIMAX
Cons of pmdarima
- Slower performance, especially for large datasets or multiple time series
- Less focus on modern forecasting techniques beyond ARIMA-based models
- More complex API, requiring more code for basic forecasting tasks
Code Comparison
pmdarima:
from pmdarima import auto_arima
model = auto_arima(y, seasonal=True, m=12)
forecast = model.predict(n_periods=12)
statsforecast:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
fcst = StatsForecast(df, models=[AutoARIMA()], freq='M')
forecast = fcst.forecast(h=12)
Both libraries offer AutoARIMA functionality, but statsforecast provides a more streamlined API for working with multiple time series and integrates well with pandas DataFrames. pmdarima offers more granular control over model parameters, which can be beneficial for advanced users but may require more code for basic forecasting tasks.
Open source time series library for Python
Pros of PyFlux
- Offers a wider range of time series models, including ARIMA, GARCH, and state space models
- Provides Bayesian inference capabilities for parameter estimation
- Includes built-in plotting functions for model diagnostics and forecasts
Cons of PyFlux
- Less actively maintained, with the last update in 2018
- Slower performance for large datasets compared to StatsForecast
- Limited documentation and community support
Code Comparison
PyFlux:
from pyflux.arima import ARIMA
model = ARIMA(data=df, ar=1, ma=1, target='y')
model.fit()
forecast = model.predict(h=5)
StatsForecast:
from statsforecast import StatsForecast
from statsforecast.models import ARIMA
sf = StatsForecast(models=[ARIMA(order=(1,0,1))])
sf.fit(df)
forecast = sf.predict(h=5)
Both libraries offer similar functionality for time series forecasting, but StatsForecast provides a more modern and efficient implementation with better performance for large-scale forecasting tasks. PyFlux offers a broader range of models and Bayesian inference capabilities, but lacks recent updates and community support.
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
Pros of Orbit
- Supports Bayesian modeling, allowing for uncertainty quantification
- Offers a wider range of models, including custom model creation
- Provides built-in visualization tools for model diagnostics
Cons of Orbit
- Steeper learning curve due to more complex API
- Slower performance compared to StatsForecasts's optimized implementations
- Less focus on traditional statistical models
Code Comparison
StatsForecasts:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(
models=[AutoARIMA()],
freq='D',
n_jobs=-1
)
forecasts = sf.forecast(df=data, h=30)
Orbit:
from orbit.models import DLT
dlt = DLT(
response_col='y',
date_col='ds',
seasonality=[7, 30.5],
num_forecast_steps=30
)
dlt.fit(df=data)
predictions = dlt.predict(df=data)
Both libraries offer time series forecasting capabilities, but they cater to different use cases. StatsForecasts focuses on traditional statistical models with high performance, while Orbit provides a more flexible framework for Bayesian modeling and custom model creation. The choice between them depends on the specific requirements of the forecasting task and the user's familiarity with different modeling approaches.
A flexible, intuitive and fast forecasting library
Pros of Greykite
- More comprehensive feature set for advanced forecasting scenarios
- Stronger focus on interpretability and explainability of models
- Better suited for complex, multi-variate time series forecasting
Cons of Greykite
- Steeper learning curve due to more complex API and configuration options
- Slower execution times for large datasets compared to StatsForecast
- Less emphasis on traditional statistical methods
Code Comparison
StatsForecast:
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(df, models=[AutoARIMA()], freq='D')
forecasts = sf.forecast(h=30)
Greykite:
from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.forecaster import Forecaster
forecaster = Forecaster()
result = forecaster.run_forecast_config(
df,
config=ForecastConfig(
model_template="AUTO",
forecast_horizon=30
)
)
Both libraries offer high-level APIs for forecasting, but Greykite's approach is more configurable and verbose. StatsForecast provides a simpler interface for quick forecasting tasks, while Greykite offers more control over the forecasting process and model selection.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Nixtla
Statistical â¡ï¸ Forecast
Lightning fast forecasting with statistical and econometric models
StatsForecast offers a collection of widely used univariate time series forecasting models, including automatic ARIMA
, ETS
, CES
, and Theta
modeling optimized for high performance using numba
. It also includes a large battery of benchmarking models.
Installation
You can install StatsForecast
with:
pip install statsforecast
or
conda install -c conda-forge statsforecast
Vist our Installation Guide for further instructions.
Quick Start
Minimal Example
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import AirPassengersDF
df = AirPassengersDF
sf = StatsForecast(
models = [AutoARIMA(season_length = 12)],
freq = 'M'
)
sf.fit(df)
sf.predict(h=12, level=[95])
Get Started with this quick guide.
Follow this end-to-end walkthrough for best practices.
Why?
Current Python alternatives for statistical models are slow, inaccurate and don't scale well. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast
includes an extensive battery of models that can efficiently fit millions of time series.
Features
- Fastest and most accurate implementations of
AutoARIMA
,AutoETS
,AutoCES
,MSTL
andTheta
in Python. - Out-of-the-box compatibility with Spark, Dask, and Ray.
- Probabilistic Forecasting and Confidence Intervals.
- Support for exogenous Variables and static covariates.
- Anomaly Detection.
- Familiar sklearn syntax:
.fit
and.predict
.
Highlights
- Inclusion of
exogenous variables
andprediction intervals
for ARIMA. - 20x faster than
pmdarima
. - 1.5x faster than
R
. - 500x faster than
Prophet
. - 4x faster than
statsmodels
. - Compiled to high performance machine code through
numba
. - 1,000,000 series in 30 min with ray.
- Replace FB-Prophet in two lines of code and gain speed and accuracy. Check the experiments here.
- Fit 10 benchmark models on 1,000,000 series in under 5 min.
Missing something? Please open an issue or write us in
Examples and Guides
ð End to End Walkthrough: Model training, evaluation and selection for multiple time series
ð Anomaly Detection: detect anomalies for time series using in-sample prediction intervals.
ð©âð¬ Cross Validation: robust modelâs performance evaluation.
âï¸ Multiple Seasonalities: how to forecast data with multiple seasonalities using an MSTL.
ð Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.
ð Intermittent Demand: forecast series with very few non-zero observations.
ð¡ï¸ Exogenous Regressors: like weather or prices
Models
Automatic Forecasting
Automatic forecasting tools search for the best parameters and select the best possible model for a group of time series. These tools are useful for large collections of univariate time series.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
AutoARIMA | â | â | â | â | â |
AutoETS | â | â | â | â | |
AutoCES | â | â | â | â | |
AutoTheta | â | â | â | â |
ARIMA Family
These models exploit the existing autocorrelations in the time series.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
ARIMA | â | â | â | â | â |
AutoRegressive | â | â | â | â | â |
Theta Family
Fit two theta lines to a deseasonalized time series, using different techniques to obtain and combine the two theta lines to produce the final forecasts.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
Theta | â | â | â | â | |
OptimizedTheta | â | â | â | â | |
DynamicTheta | â | â | â | â | |
DynamicOptimizedTheta | â | â | â | â |
Multiple Seasonalities
Suited for signals with more than one clear seasonality. Useful for low-frequency data like electricity and logs.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
MSTL | â | â | â | â | If trend forecaster supports |
GARCH and ARCH Models
Suited for modeling time series that exhibit non-constant volatility over time. The ARCH model is a particular case of GARCH.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
GARCH | â | â | â | â | |
ARCH | â | â | â | â |
Baseline Models
Classical models for establishing baseline.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
HistoricAverage | â | â | â | â | |
Naive | â | â | â | â | |
RandomWalkWithDrift | â | â | â | â | |
SeasonalNaive | â | â | â | â | |
WindowAverage | â | ||||
SeasonalWindowAverage | â |
Exponential Smoothing
Uses a weighted average of all past observations where the weights decrease exponentially into the past. Suitable for data with clear trend and/or seasonality. Use the SimpleExponential
family for data with no clear trend or seasonality.
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
SimpleExponentialSmoothing | â | ||||
SimpleExponentialSmoothingOptimized | â | ||||
SeasonalExponentialSmoothing | â | ||||
SeasonalExponentialSmoothingOptimized | â | ||||
Holt | â | â | â | â | |
HoltWinters | â | â | â | â |
Sparse or Intermittent
Suited for series with very few non-zero observations
Model | Point Forecast | Probabilistic Forecast | Insample fitted values | Probabilistic fitted values | Exogenous features |
---|---|---|---|---|---|
ADIDA | â | â | â | ||
CrostonClassic | â | â | â | ||
CrostonOptimized | â | â | â | ||
CrostonSBA | â | â | â | ||
IMAPA | â | â | â | ||
TSB | â | â | â |
ð¨ How to contribute
See CONTRIBUTING.md.
Citing
@misc{garza2022statsforecast,
author={Federico Garza, Max Mergenthaler Canseco, Cristian Challú, Kin G. Olivares},
title = {{StatsForecast}: Lightning fast forecasting with statistical and econometric models},
year={2022},
howpublished={{PyCon} Salt Lake City, Utah, US 2022},
url={https://github.com/Nixtla/statsforecast}
}
Contributors â¨
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Top Related Projects
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
A unified framework for machine learning with time series
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Open source time series library for Python
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
A flexible, intuitive and fast forecasting library
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot