pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

1,640

239

1,640

View on GitHub

Top Related Projects

statsmodels

10,608

Statsmodels: statistical modeling and econometrics in Python

prophet

19,137

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

pyflux

2,128

Open source time series library for Python

orbit

1,964

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

sktime

8,358

A unified framework for machine learning with time series

statsforecast

4,424

Lightning ⚡️ fast forecasting with statistical and econometric models.

Quick Overview

pmdarima is a Python library for time series forecasting that brings R's auto.arima functionality to Python. It provides tools for automatic ARIMA model selection, forecasting, and statistical testing, making it easier for data scientists and analysts to work with time series data in Python.

Pros

Automatic model selection simplifies the ARIMA modeling process
Includes advanced features like seasonal decomposition and exogenous variables
Compatible with scikit-learn's API, allowing easy integration with existing ML workflows
Comprehensive documentation and examples

Cons

Can be computationally intensive for large datasets
May not always find the optimal model, especially for complex time series
Limited to ARIMA-based models, which may not be suitable for all time series problems
Steeper learning curve compared to simpler forecasting libraries

Code Examples

Basic ARIMA model fitting and forecasting:

from pmdarima import auto_arima
import numpy as np

# Generate sample data
data = np.random.normal(0, 1, 100)

# Fit ARIMA model
model = auto_arima(data, start_p=1, start_q=1, max_p=5, max_q=5, seasonal=False)

# Make forecasts
forecasts = model.predict(n_periods=10)

Seasonal decomposition:

from pmdarima.preprocessing import decompose
import pandas as pd

# Load data
data = pd.read_csv('time_series_data.csv', parse_dates=['date'], index_col='date')

# Perform seasonal decomposition
decomposition = decompose(data['value'], 'additive', m=12)
trend, seasonal, residual = decomposition.trend, decomposition.seasonal, decomposition.resid

ARIMA model with exogenous variables:

from pmdarima import auto_arima
import pandas as pd

# Load data with exogenous variables
data = pd.read_csv('data_with_exog.csv', parse_dates=['date'], index_col='date')

# Fit ARIMA model with exogenous variables
model = auto_arima(data['target'], exogenous=data[['exog1', 'exog2']])

# Forecast with future exogenous values
future_exog = pd.DataFrame({'exog1': [1, 2, 3], 'exog2': [4, 5, 6]})
forecasts = model.predict(n_periods=3, X=future_exog)

Getting Started

To get started with pmdarima, follow these steps:

Install the library:

pip install pmdarima

Import the necessary modules:

from pmdarima import auto_arima
import pandas as pd
import matplotlib.pyplot as plt

Load your time series data and fit a model:

data = pd.read_csv('your_data.csv', parse_dates=['date'], index_col='date')
model = auto_arima(data['value'], seasonal=True, m=12)

Make forecasts and plot the results:

forecasts = model.predict(n_periods=24)
plt.plot(data.index, data['value'], label='Observed')
plt.plot(pd.date_range(start=data.index[-1], periods=25, freq='M')[1:], forecasts, label='Forecast')
plt.legend()
plt.show()

Competitor Comparisons

statsmodels

10,608

Statsmodels: statistical modeling and econometrics in Python

Pros of statsmodels

Comprehensive statistical library with a wide range of models and tests
Well-established and extensively documented
Integrates seamlessly with other scientific Python libraries

Cons of statsmodels

Steeper learning curve for beginners
Can be slower for certain operations compared to specialized libraries
Less focused on time series analysis specifically

Code Comparison

statsmodels:

import statsmodels.api as sm
model = sm.tsa.ARIMA(data, order=(1,1,1))
results = model.fit()
forecast = results.forecast(steps=5)

pmdarima:

from pmdarima import auto_arima
model = auto_arima(data, start_p=1, start_q=1, max_p=5, max_q=5)
forecast = model.predict(n_periods=5)

Key Differences

statsmodels offers a broader range of statistical models and tests
pmdarima focuses specifically on time series analysis and forecasting
pmdarima provides automated model selection with auto_arima
statsmodels requires manual specification of ARIMA parameters
pmdarima has a more user-friendly API for time series tasks

Both libraries have their strengths, with statsmodels being more comprehensive and pmdarima offering specialized time series functionality. The choice between them depends on the specific requirements of your project and your familiarity with time series analysis.

prophet

19,137

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Pros of Prophet

User-friendly interface with automatic handling of seasonality and holidays
Robust to missing data and outliers
Supports multiple seasonalities and custom seasonalities

Cons of Prophet

Less flexible for complex time series patterns
Can be slower for large datasets
Limited options for fine-tuning model parameters

Code Comparison

Prophet:

from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

pmdarima:

from pmdarima import auto_arima
model = auto_arima(y, seasonal=True, m=12)
forecast = model.predict(n_periods=365)

Key Differences

Prophet focuses on ease of use and automatic forecasting
pmdarima offers more control over model selection and parameters
Prophet handles seasonality and holidays out-of-the-box
pmdarima provides a wider range of ARIMA-based models

Use Cases

Prophet is ideal for:

Quick forecasting with minimal configuration
Handling multiple seasonalities and holiday effects

pmdarima is better suited for:

More complex time series analysis
Fine-tuning model parameters
Exploring various ARIMA-based models

Both libraries have their strengths, and the choice depends on the specific requirements of the forecasting task at hand.

pyflux

2,128

Open source time series library for Python

Pros of pyflux

Offers a wider range of time series models, including ARIMA, GARCH, and state space models
Provides Bayesian inference capabilities for more robust uncertainty quantification
Includes built-in plotting functions for easy visualization of results

Cons of pyflux

Less actively maintained, with fewer recent updates compared to pmdarima
Documentation is less comprehensive and may be outdated in some areas
Smaller community and fewer contributors, potentially leading to slower bug fixes and feature additions

Code Comparison

pyflux example:

import pyflux as pf
model = pf.ARIMA(data=df['column'], ar=1, ma=1, integ=1)
result = model.fit()
forecast = model.predict(h=5)

pmdarima example:

from pmdarima import auto_arima
model = auto_arima(df['column'], start_p=1, start_q=1, d=1, max_p=5, max_q=5)
forecast = model.predict(n_periods=5)

Both libraries offer ARIMA modeling capabilities, but pyflux provides a more explicit model specification, while pmdarima's auto_arima function automatically selects the best parameters. pmdarima focuses on ease of use and automation, while pyflux offers more flexibility in model customization.

orbit

1,964

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Pros of Orbit

Supports Bayesian modeling for time series forecasting
Handles multiple seasonality patterns
Provides built-in visualization tools for model diagnostics

Cons of Orbit

Steeper learning curve due to its Bayesian approach
More computationally intensive, especially for large datasets
Less extensive documentation compared to pmdarima

Code Comparison

pmdarima example:

from pmdarima import auto_arima

model = auto_arima(y, seasonal=True, m=12)
forecast = model.predict(n_periods=12)

Orbit example:

from orbit.models import DLT

model = DLT(response_col='y', date_col='ds', seasonality=52)
model.fit(df)
prediction = model.predict(df)

Key Differences

pmdarima focuses on traditional ARIMA models, while Orbit emphasizes Bayesian structural time series
Orbit offers more flexibility in modeling complex seasonality patterns
pmdarima provides automated model selection (auto_arima), whereas Orbit requires more manual configuration
Orbit integrates well with Pandas DataFrames, while pmdarima works primarily with numpy arrays

Both libraries have their strengths, with pmdarima being more accessible for beginners and Orbit offering advanced capabilities for complex time series modeling.

sktime

8,358

A unified framework for machine learning with time series

Pros of sktime

Broader scope: Covers various time series tasks beyond forecasting, including classification and regression
Scikit-learn compatible: Integrates well with the scikit-learn ecosystem
Extensive functionality: Offers a wide range of algorithms and tools for time series analysis

Cons of sktime

Steeper learning curve: More complex API due to its broader scope
Less specialized: May not offer as deep functionality for specific ARIMA-related tasks

Code Comparison

sktime example:

from sktime.forecasting.arima import ARIMA
forecaster = ARIMA(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])

pmdarima example:

from pmdarima import auto_arima
model = auto_arima(y_train, seasonal=True, m=12)
forecasts = model.predict(n_periods=3)

The main difference is that pmdarima offers automatic ARIMA model selection with auto_arima, while sktime requires manual specification of ARIMA parameters. However, sktime provides a more consistent interface across different time series tasks and algorithms.

statsforecast

4,424

Lightning ⚡️ fast forecasting with statistical and econometric models.

Pros of statsforecast

Faster execution times, especially for large datasets
Supports a wider range of statistical forecasting models
Designed for scalability and high-performance computing

Cons of statsforecast

Less mature and potentially less stable than pmdarima
May have a steeper learning curve for users familiar with pmdarima
Documentation might not be as comprehensive as pmdarima's

Code Comparison

statsforecast:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(models=[AutoARIMA()], freq='D')
sf.fit(df)
forecasts = sf.predict(h=30)

pmdarima:

from pmdarima import auto_arima

model = auto_arima(y, seasonal=True, m=12)
forecasts = model.predict(n_periods=30)

Both libraries offer AutoARIMA functionality, but statsforecast is designed for handling multiple time series simultaneously, while pmdarima focuses on single time series analysis. statsforecast's API is more oriented towards batch processing, which can be advantageous for large-scale forecasting tasks. However, pmdarima's syntax might be more intuitive for users familiar with traditional statistical modeling approaches.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pmdarima

Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time series analysis capabilities. This includes:

The equivalent of R's auto.arima functionality
A collection of statistical tests of stationarity and seasonality
Time series utilities, such as differencing and inverse differencing
Numerous endogenous and exogenous transformers and featurizers, including Box-Cox and Fourier transformations
Seasonal time series decompositions
Cross-validation utilities
A rich collection of built-in time series datasets for prototyping and examples
Scikit-learn-esque pipelines to consolidate your estimators and promote productionization

Pmdarima wraps statsmodels under the hood, but is designed with an interface that's familiar to users coming from a scikit-learn background.

Installation

pip

Pmdarima has binary and source distributions for Windows, Mac and Linux (manylinux) on pypi under the package name pmdarima and can be downloaded via pip:

pip install pmdarima

conda

Pmdarima also has Mac and Linux builds available via conda and can be installed like so:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install pmdarima

Note: We do not maintain our own Conda binaries, they are maintained at https://github.com/conda-forge/pmdarima-feedstock. See that repo for further documentation on working with Pmdarima on Conda.

Quickstart Examples

Fitting a simple auto-ARIMA on the wineind dataset:

import pmdarima as pm
from pmdarima.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# Load/split your data
y = pm.datasets.load_wineind()
train, test = train_test_split(y, train_size=150)

# Fit your model
model = pm.auto_arima(train, seasonal=True, m=12)

# make your forecasts
forecasts = model.predict(test.shape[0])  # predict N steps into the future

# Visualize the forecasts (blue=train, green=forecasts)
x = np.arange(y.shape[0])
plt.plot(x[:150], train, c='blue')
plt.plot(x[150:], forecasts, c='green')
plt.show()

Fitting a more complex pipeline on the sunspots dataset, serializing it, and then loading it from disk to make predictions:

import pmdarima as pm
from pmdarima.model_selection import train_test_split
from pmdarima.pipeline import Pipeline
from pmdarima.preprocessing import BoxCoxEndogTransformer
import pickle

# Load/split your data
y = pm.datasets.load_sunspots()
train, test = train_test_split(y, train_size=2700)

# Define and fit your pipeline
pipeline = Pipeline([
    ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)),  # lmbda2 avoids negative values
    ('arima', pm.AutoARIMA(seasonal=True, m=12,
                           suppress_warnings=True,
                           trace=True))
])

pipeline.fit(train)

# Serialize your model just like you would in scikit:
with open('model.pkl', 'wb') as pkl:
    pickle.dump(pipeline, pkl)
    
# Load it and make predictions seamlessly:
with open('model.pkl', 'rb') as pkl:
    mod = pickle.load(pkl)
    print(mod.predict(15))
# [25.20580375 25.05573898 24.4263037  23.56766793 22.67463049 21.82231043
# 21.04061069 20.33693017 19.70906027 19.1509862  18.6555793  18.21577243
# 17.8250318  17.47750614 17.16803394]

Availability

pmdarima is available on PyPi in pre-built Wheel files for Python 3.9+ for the following platforms:

Mac (64-bit)
Linux (64-bit manylinux)
Windows (64-bit)
- 32-bit wheels are available for pmdarima versions below 2.0.0 and Python versions below 3.10

If a wheel doesn't exist for your platform, you can still pip install and it will build from the source distribution tarball, however you'll need cython>=0.29 and gcc (Mac/Linux) or MinGW (Windows) in order to build the package from source.

Note that legacy versions (<1.0.0) are available under the name "pyramid-arima" and can be pip installed via:

# Legacy warning:
$ pip install pyramid-arima
# python -c 'import pyramid;'

However, this is not recommended.

Documentation

All of your questions and more (including examples and guides) can be answered by the pmdarima documentation. If not, always feel free to file an issue.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot