statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.

4,198

301

4,198

111

View on GitHub

Top Related Projects

prophet

18,752

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

sktime

8,122

A unified framework for machine learning with time series

pmdarima

1,611

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pyflux

2,121

Open source time series library for Python

orbit

1,931

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

greykite

1,824

A flexible, intuitive and fast forecasting library

Quick Overview

Statsforecast is a Python library for time series forecasting that focuses on statistical and econometric models. It provides fast and accurate implementations of popular forecasting algorithms, including ARIMA, ETS, and various other statistical methods. The library is designed to be efficient, scalable, and easy to use for both beginners and advanced users.

Pros

High performance: Utilizes Rust and NumPy for fast computations, making it suitable for large-scale forecasting tasks
Wide range of models: Offers a variety of statistical and econometric forecasting methods
Easy integration: Compatible with popular data science libraries like pandas and scikit-learn
Automatic model selection: Includes features for automatic model selection and hyperparameter tuning

Cons

Limited to statistical models: Does not include machine learning or deep learning-based forecasting methods
Steeper learning curve: May require more statistical knowledge compared to some other forecasting libraries
Less extensive documentation: While improving, the documentation may not be as comprehensive as more established libraries

Code Examples

Basic forecasting with AutoARIMA:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

model = StatsForecast(models=[AutoARIMA()], freq='D')
model.fit(df)
forecast = model.forecast(h=30)

Using multiple models for ensemble forecasting:

from statsforecast.models import AutoARIMA, ETS, Naive

models = [AutoARIMA(), ETS(), Naive()]
sf = StatsForecast(models=models, freq='D')
sf.fit(df)
forecast = sf.forecast(h=30)

Cross-validation for model evaluation:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

model = StatsForecast(models=[AutoARIMA()], freq='D')
cv_results = model.cross_validation(df, h=30, step_size=1, n_windows=5)

Getting Started

To get started with Statsforecast, follow these steps:

Install the library:

pip install statsforecast

Import the necessary modules and create a sample dataset:

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

# Create a sample dataset
dates = pd.date_range(start='2020-01-01', end='2022-12-31', freq='D')
values = np.random.randn(len(dates)).cumsum()
df = pd.DataFrame({'ds': dates, 'y': values})

Fit a model and generate forecasts:

model = StatsForecast(models=[AutoARIMA()], freq='D')
model.fit(df)
forecast = model.forecast(h=30)
print(forecast)

Competitor Comparisons

prophet

18,752

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Pros of Prophet

User-friendly interface with automatic handling of seasonality and holidays
Robust handling of missing data and outliers
Extensive documentation and community support

Cons of Prophet

Can be slower for large datasets or many time series
Less flexibility for custom models or advanced statistical techniques
May overfit on datasets with limited historical data

Code Comparison

Prophet:

from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

StatsForecasts:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
fcst = StatsForecast(df, models=[AutoARIMA()], freq='D')
forecast = fcst.forecast(h=365)

Key Differences

Prophet focuses on an additive model with intuitive parameters, while StatsForecasts offers a variety of statistical models
StatsForecasts is generally faster, especially for multiple time series
Prophet provides built-in plotting and diagnostics, whereas StatsForecasts relies more on external visualization tools
StatsForecasts offers more advanced statistical models and the ability to easily combine multiple forecasting methods

Both libraries have their strengths, with Prophet excelling in ease of use and interpretability, while StatsForecasts offers more flexibility and performance for advanced users and large-scale forecasting tasks.

sktime

8,122

A unified framework for machine learning with time series

Pros of sktime

Broader scope, covering various time series tasks beyond forecasting
Extensive ecosystem with many algorithms and transformers
Strong integration with scikit-learn and pandas

Cons of sktime

Steeper learning curve due to its comprehensive nature
Potentially slower performance for some forecasting tasks
Less focus on probabilistic forecasting compared to StatsForecasts

Code Comparison

sktime example:

from sktime.forecasting.arima import ARIMA
from sktime.datasets import load_airline

y = load_airline()
forecaster = ARIMA(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y)
y_pred = forecaster.predict(fh=[1, 2, 3])

StatsForecasts example:

from statsforecast import StatsForecast
from statsforecast.models import ARIMA

sf = StatsForecast(models=[ARIMA(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))])
sf.fit(df)
forecasts = sf.predict(h=3)

Both libraries offer ARIMA forecasting, but sktime provides a more scikit-learn-like API, while StatsForecasts focuses on simplicity and performance for forecasting tasks.

pmdarima

1,611

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Pros of pmdarima

More mature project with a longer history and larger user base
Extensive documentation and examples for various use cases
Supports a wider range of ARIMA-based models, including SARIMAX

Cons of pmdarima

Slower performance, especially for large datasets or multiple time series
Less focus on modern forecasting techniques beyond ARIMA-based models
More complex API, requiring more code for basic forecasting tasks

Code Comparison

pmdarima:

from pmdarima import auto_arima

model = auto_arima(y, seasonal=True, m=12)
forecast = model.predict(n_periods=12)

statsforecast:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

fcst = StatsForecast(df, models=[AutoARIMA()], freq='M')
forecast = fcst.forecast(h=12)

Both libraries offer AutoARIMA functionality, but statsforecast provides a more streamlined API for working with multiple time series and integrates well with pandas DataFrames. pmdarima offers more granular control over model parameters, which can be beneficial for advanced users but may require more code for basic forecasting tasks.

pyflux

2,121

Open source time series library for Python

Pros of PyFlux

Offers a wider range of time series models, including ARIMA, GARCH, and state space models
Provides Bayesian inference capabilities for parameter estimation
Includes built-in plotting functions for model diagnostics and forecasts

Cons of PyFlux

Less actively maintained, with the last update in 2018
Slower performance for large datasets compared to StatsForecast
Limited documentation and community support

Code Comparison

PyFlux:

from pyflux.arima import ARIMA

model = ARIMA(data=df, ar=1, ma=1, target='y')
model.fit()
forecast = model.predict(h=5)

StatsForecast:

from statsforecast import StatsForecast
from statsforecast.models import ARIMA

sf = StatsForecast(models=[ARIMA(order=(1,0,1))])
sf.fit(df)
forecast = sf.predict(h=5)

Both libraries offer similar functionality for time series forecasting, but StatsForecast provides a more modern and efficient implementation with better performance for large-scale forecasting tasks. PyFlux offers a broader range of models and Bayesian inference capabilities, but lacks recent updates and community support.

orbit

1,931

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Pros of Orbit

Supports Bayesian modeling, allowing for uncertainty quantification
Offers a wider range of models, including custom model creation
Provides built-in visualization tools for model diagnostics

Cons of Orbit

Steeper learning curve due to more complex API
Slower performance compared to StatsForecasts's optimized implementations
Less focus on traditional statistical models

Code Comparison

StatsForecasts:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(
    models=[AutoARIMA()],
    freq='D',
    n_jobs=-1
)
forecasts = sf.forecast(df=data, h=30)

Orbit:

from orbit.models import DLT

dlt = DLT(
    response_col='y',
    date_col='ds',
    seasonality=[7, 30.5],
    num_forecast_steps=30
)
dlt.fit(df=data)
predictions = dlt.predict(df=data)

Both libraries offer time series forecasting capabilities, but they cater to different use cases. StatsForecasts focuses on traditional statistical models with high performance, while Orbit provides a more flexible framework for Bayesian modeling and custom model creation. The choice between them depends on the specific requirements of the forecasting task and the user's familiarity with different modeling approaches.

greykite

1,824

A flexible, intuitive and fast forecasting library

Pros of Greykite

More comprehensive feature set for advanced forecasting scenarios
Stronger focus on interpretability and explainability of models
Better suited for complex, multi-variate time series forecasting

Cons of Greykite

Steeper learning curve due to more complex API and configuration options
Slower execution times for large datasets compared to StatsForecast
Less emphasis on traditional statistical methods

Code Comparison

StatsForecast:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(df, models=[AutoARIMA()], freq='D')
forecasts = sf.forecast(h=30)

Greykite:

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.forecaster import Forecaster

forecaster = Forecaster()
result = forecaster.run_forecast_config(
    df,
    config=ForecastConfig(
        model_template="AUTO",
        forecast_horizon=30
    )
)

Both libraries offer high-level APIs for forecasting, but Greykite's approach is more configurable and verbose. StatsForecast provides a simpler interface for quick forecasting tasks, while Greykite offers more control over the forecasting process and model selection.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Nixtla

Statistical â¡ï¸ Forecast

Lightning fast forecasting with statistical and econometric models

StatsForecast offers a collection of widely used univariate time series forecasting models, including automatic ARIMA, ETS, CES, and Theta modeling optimized for high performance using numba. It also includes a large battery of benchmarking models.

Installation

You can install StatsForecast with:

pip install statsforecast

conda install -c conda-forge statsforecast

Vist our Installation Guide for further instructions.

Quick Start

Minimal Example

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import AirPassengersDF

df = AirPassengersDF
sf = StatsForecast(
    models=[AutoARIMA(season_length=12)],
    freq='ME',
)
sf.fit(df)
sf.predict(h=12, level=[95])

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Why?

Current Python alternatives for statistical models are slow, inaccurate and don't scale well. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast includes an extensive battery of models that can efficiently fit millions of time series.

Features

Fastest and most accurate implementations of AutoARIMA, AutoETS, AutoCES, MSTL and Theta in Python.
Out-of-the-box compatibility with Spark, Dask, and Ray.
Probabilistic Forecasting and Confidence Intervals.
Support for exogenous Variables and static covariates.
Anomaly Detection.
Familiar sklearn syntax: .fit and .predict.

Highlights

Inclusion of exogenous variables and prediction intervals for ARIMA.
20x faster than pmdarima.
1.5x faster than R.
500x faster than Prophet.
4x faster than statsmodels.
Compiled to high performance machine code through numba.
1,000,000 series in 30 min with ray.
Replace FB-Prophet in two lines of code and gain speed and accuracy. Check the experiments here.
Fit 10 benchmark models on 1,000,000 series in under 5 min.

Missing something? Please open an issue or write us in

Examples and Guides

ð End to End Walkthrough: Model training, evaluation and selection for multiple time series

ð Anomaly Detection: detect anomalies for time series using in-sample prediction intervals.

âï¸ Multiple Seasonalities: how to forecast data with multiple seasonalities using an MSTL.

ð Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

ð Intermittent Demand: forecast series with very few non-zero observations.

ð¡ï¸ Exogenous Regressors: like weather or prices

Models

Automatic Forecasting

Automatic forecasting tools search for the best parameters and select the best possible model for a group of time series. These tools are useful for large collections of univariate time series.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values	Exogenous features
AutoARIMA	â	â	â	â	â
AutoETS	â	â	â	â
AutoCES	â	â	â	â
AutoTheta	â	â	â	â
AutoMFLES	â	â	â	â	â
AutoTBATS	â	â	â	â

ARIMA Family

These models exploit the existing autocorrelations in the time series.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values	Exogenous features
ARIMA	â	â	â	â	â
AutoRegressive	â	â	â	â	â

Theta Family

Fit two theta lines to a deseasonalized time series, using different techniques to obtain and combine the two theta lines to produce the final forecasts.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values
Theta	â	â	â	â
OptimizedTheta	â	â	â	â
DynamicTheta	â	â	â	â
DynamicOptimizedTheta	â	â	â	â

Multiple Seasonalities

Suited for signals with more than one clear seasonality. Useful for low-frequency data like electricity and logs.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values	Exogenous features
MSTL	â	â	â	â	If trend forecaster supports
MFLES	â	â	â	â	â
TBATS	â	â	â	â

GARCH and ARCH Models

Suited for modeling time series that exhibit non-constant volatility over time. The ARCH model is a particular case of GARCH.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values	Exogenous features
GARCH	â	â	â	â
ARCH	â	â	â	â

Baseline Models

Classical models for establishing baseline.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values
HistoricAverage	â	â	â	â
Naive	â	â	â	â
RandomWalkWithDrift	â	â	â	â
SeasonalNaive	â	â	â	â
WindowAverage	â
SeasonalWindowAverage	â

Exponential Smoothing

Uses a weighted average of all past observations where the weights decrease exponentially into the past. Suitable for data with clear trend and/or seasonality. Use the SimpleExponential family for data with no clear trend or seasonality.

Model	Point Forecast	Probabilistic Forecast	Insample fitted values	Probabilistic fitted values
SimpleExponentialSmoothing	â		â
SimpleExponentialSmoothingOptimized	â		â
SeasonalExponentialSmoothing	â		â
SeasonalExponentialSmoothingOptimized	â		â
Holt	â	â	â	â
HoltWinters	â	â	â	â

Sparse or Inttermitent

Suited for series with very few non-zero observations

Model	Point Forecast	Insample fitted values	Probabilistic fitted values
ADIDA	â	â	â
CrostonClassic	â	â	â
CrostonOptimized	â	â	â
CrostonSBA	â	â	â
IMAPA	â	â	â
TSB	â	â	â

ð¨ How to contribute

See CONTRIBUTING.md.

Citing

@misc{garza2022statsforecast,
    author={Azul Garza, Max Mergenthaler Canseco, Cristian ChallÃº, Kin G. Olivares},
    title = {{StatsForecast}: Lightning fast forecasting with statistical and econometric models},
    year={2022},
    howpublished={{PyCon} Salt Lake City, Utah, US 2022},
    url={https://github.com/Nixtla/statsforecast}
}

Contributors â¨

Thanks goes to these wonderful people (emoji key):

_azul ð» ð§	_{JosÃ© Morales} ð» ð§	_{Sugato Ray} ð»	_{Jeff Tackes} ð	_darinkist ð¤	_{Alec Helyar} ð¬	_{Dave Hirschfeld} ð¬
_mergenthaler ð»	_Kin ð»	_Yasslight90 ð¤	_asinig ð¤	_{Philip GilliÃen} ð»	_{Sebastian Hagn} ð ð	_{Han Wang} ð»
_{Ben Jeffrey} ð	_Beliavsky ð	_{Mariana Menchero GarcÃa} ð»	_{Nikhil Gupta} ð	_JD ð	_{josh attenberg} ð»	_{JeroenPeterBos} ð»
_{Jeroen Van Der Donckt} ð»	_Roymprog ð	_{Nelson CÃ¡rdenas BolaÃ±o} ð	_{Kyle Schmaus} ð»	_{Akmal Soliev} ð»	_{Nick To} ð»	_{Kevin Kho} ð»
_{Yiben Huang} ð	_{Andrew Gross} ð	_taniishkaaa ð	_{Manuel Calzolari} ð»