Convert Figma logo to code with AI

alkaline-ml logopmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

1,579
234
1,579
62

Top Related Projects

Statsmodels: statistical modeling and econometrics in Python

18,363

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

2,113

Open source time series library for Python

1,864

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

7,833

A unified framework for machine learning with time series

Lightning ⚡️ fast forecasting with statistical and econometric models.

Quick Overview

pmdarima is a Python library for time series forecasting that brings R's auto.arima functionality to Python. It provides tools for automatic ARIMA model selection, forecasting, and statistical testing, making it easier for data scientists and analysts to work with time series data in Python.

Pros

  • Automatic model selection simplifies the ARIMA modeling process
  • Includes advanced features like seasonal decomposition and exogenous variables
  • Compatible with scikit-learn's API, allowing easy integration with existing ML workflows
  • Comprehensive documentation and examples

Cons

  • Can be computationally intensive for large datasets
  • May not always find the optimal model, especially for complex time series
  • Limited to ARIMA-based models, which may not be suitable for all time series problems
  • Steeper learning curve compared to simpler forecasting libraries

Code Examples

  1. Basic ARIMA model fitting and forecasting:
from pmdarima import auto_arima
import numpy as np

# Generate sample data
data = np.random.normal(0, 1, 100)

# Fit ARIMA model
model = auto_arima(data, start_p=1, start_q=1, max_p=5, max_q=5, seasonal=False)

# Make forecasts
forecasts = model.predict(n_periods=10)
  1. Seasonal decomposition:
from pmdarima.preprocessing import decompose
import pandas as pd

# Load data
data = pd.read_csv('time_series_data.csv', parse_dates=['date'], index_col='date')

# Perform seasonal decomposition
decomposition = decompose(data['value'], 'additive', m=12)
trend, seasonal, residual = decomposition.trend, decomposition.seasonal, decomposition.resid
  1. ARIMA model with exogenous variables:
from pmdarima import auto_arima
import pandas as pd

# Load data with exogenous variables
data = pd.read_csv('data_with_exog.csv', parse_dates=['date'], index_col='date')

# Fit ARIMA model with exogenous variables
model = auto_arima(data['target'], exogenous=data[['exog1', 'exog2']])

# Forecast with future exogenous values
future_exog = pd.DataFrame({'exog1': [1, 2, 3], 'exog2': [4, 5, 6]})
forecasts = model.predict(n_periods=3, X=future_exog)

Getting Started

To get started with pmdarima, follow these steps:

  1. Install the library:
pip install pmdarima
  1. Import the necessary modules:
from pmdarima import auto_arima
import pandas as pd
import matplotlib.pyplot as plt
  1. Load your time series data and fit a model:
data = pd.read_csv('your_data.csv', parse_dates=['date'], index_col='date')
model = auto_arima(data['value'], seasonal=True, m=12)
  1. Make forecasts and plot the results:
forecasts = model.predict(n_periods=24)
plt.plot(data.index, data['value'], label='Observed')
plt.plot(pd.date_range(start=data.index[-1], periods=25, freq='M')[1:], forecasts, label='Forecast')
plt.legend()
plt.show()

Competitor Comparisons

Statsmodels: statistical modeling and econometrics in Python

Pros of statsmodels

  • Comprehensive statistical library with a wide range of models and tests
  • Well-established and extensively documented
  • Integrates seamlessly with other scientific Python libraries

Cons of statsmodels

  • Steeper learning curve for beginners
  • Can be slower for certain operations compared to specialized libraries
  • Less focused on time series analysis specifically

Code Comparison

statsmodels:

import statsmodels.api as sm
model = sm.tsa.ARIMA(data, order=(1,1,1))
results = model.fit()
forecast = results.forecast(steps=5)

pmdarima:

from pmdarima import auto_arima
model = auto_arima(data, start_p=1, start_q=1, max_p=5, max_q=5)
forecast = model.predict(n_periods=5)

Key Differences

  • statsmodels offers a broader range of statistical models and tests
  • pmdarima focuses specifically on time series analysis and forecasting
  • pmdarima provides automated model selection with auto_arima
  • statsmodels requires manual specification of ARIMA parameters
  • pmdarima has a more user-friendly API for time series tasks

Both libraries have their strengths, with statsmodels being more comprehensive and pmdarima offering specialized time series functionality. The choice between them depends on the specific requirements of your project and your familiarity with time series analysis.

18,363

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Pros of Prophet

  • User-friendly interface with automatic handling of seasonality and holidays
  • Robust to missing data and outliers
  • Supports multiple seasonalities and custom seasonalities

Cons of Prophet

  • Less flexible for complex time series patterns
  • Can be slower for large datasets
  • Limited options for fine-tuning model parameters

Code Comparison

Prophet:

from fbprophet import Prophet
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

pmdarima:

from pmdarima import auto_arima
model = auto_arima(y, seasonal=True, m=12)
forecast = model.predict(n_periods=365)

Key Differences

  • Prophet focuses on ease of use and automatic forecasting
  • pmdarima offers more control over model selection and parameters
  • Prophet handles seasonality and holidays out-of-the-box
  • pmdarima provides a wider range of ARIMA-based models

Use Cases

Prophet is ideal for:

  • Quick forecasting with minimal configuration
  • Handling multiple seasonalities and holiday effects

pmdarima is better suited for:

  • More complex time series analysis
  • Fine-tuning model parameters
  • Exploring various ARIMA-based models

Both libraries have their strengths, and the choice depends on the specific requirements of the forecasting task at hand.

2,113

Open source time series library for Python

Pros of pyflux

  • Offers a wider range of time series models, including ARIMA, GARCH, and state space models
  • Provides Bayesian inference capabilities for more robust uncertainty quantification
  • Includes built-in plotting functions for easy visualization of results

Cons of pyflux

  • Less actively maintained, with fewer recent updates compared to pmdarima
  • Documentation is less comprehensive and may be outdated in some areas
  • Smaller community and fewer contributors, potentially leading to slower bug fixes and feature additions

Code Comparison

pyflux example:

import pyflux as pf
model = pf.ARIMA(data=df['column'], ar=1, ma=1, integ=1)
result = model.fit()
forecast = model.predict(h=5)

pmdarima example:

from pmdarima import auto_arima
model = auto_arima(df['column'], start_p=1, start_q=1, d=1, max_p=5, max_q=5)
forecast = model.predict(n_periods=5)

Both libraries offer ARIMA modeling capabilities, but pyflux provides a more explicit model specification, while pmdarima's auto_arima function automatically selects the best parameters. pmdarima focuses on ease of use and automation, while pyflux offers more flexibility in model customization.

1,864

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Pros of Orbit

  • Supports Bayesian modeling for time series forecasting
  • Handles multiple seasonality patterns
  • Provides built-in visualization tools for model diagnostics

Cons of Orbit

  • Steeper learning curve due to its Bayesian approach
  • More computationally intensive, especially for large datasets
  • Less extensive documentation compared to pmdarima

Code Comparison

pmdarima example:

from pmdarima import auto_arima

model = auto_arima(y, seasonal=True, m=12)
forecast = model.predict(n_periods=12)

Orbit example:

from orbit.models import DLT

model = DLT(response_col='y', date_col='ds', seasonality=52)
model.fit(df)
prediction = model.predict(df)

Key Differences

  • pmdarima focuses on traditional ARIMA models, while Orbit emphasizes Bayesian structural time series
  • Orbit offers more flexibility in modeling complex seasonality patterns
  • pmdarima provides automated model selection (auto_arima), whereas Orbit requires more manual configuration
  • Orbit integrates well with Pandas DataFrames, while pmdarima works primarily with numpy arrays

Both libraries have their strengths, with pmdarima being more accessible for beginners and Orbit offering advanced capabilities for complex time series modeling.

7,833

A unified framework for machine learning with time series

Pros of sktime

  • Broader scope: Covers various time series tasks beyond forecasting, including classification and regression
  • Scikit-learn compatible: Integrates well with the scikit-learn ecosystem
  • Extensive functionality: Offers a wide range of algorithms and tools for time series analysis

Cons of sktime

  • Steeper learning curve: More complex API due to its broader scope
  • Less specialized: May not offer as deep functionality for specific ARIMA-related tasks

Code Comparison

sktime example:

from sktime.forecasting.arima import ARIMA
forecaster = ARIMA(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])

pmdarima example:

from pmdarima import auto_arima
model = auto_arima(y_train, seasonal=True, m=12)
forecasts = model.predict(n_periods=3)

The main difference is that pmdarima offers automatic ARIMA model selection with auto_arima, while sktime requires manual specification of ARIMA parameters. However, sktime provides a more consistent interface across different time series tasks and algorithms.

Lightning ⚡️ fast forecasting with statistical and econometric models.

Pros of statsforecast

  • Faster execution times, especially for large datasets
  • Supports a wider range of statistical forecasting models
  • Designed for scalability and high-performance computing

Cons of statsforecast

  • Less mature and potentially less stable than pmdarima
  • May have a steeper learning curve for users familiar with pmdarima
  • Documentation might not be as comprehensive as pmdarima's

Code Comparison

statsforecast:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(models=[AutoARIMA()], freq='D')
sf.fit(df)
forecasts = sf.predict(h=30)

pmdarima:

from pmdarima import auto_arima

model = auto_arima(y, seasonal=True, m=12)
forecasts = model.predict(n_periods=30)

Both libraries offer AutoARIMA functionality, but statsforecast is designed for handling multiple time series simultaneously, while pmdarima focuses on single time series analysis. statsforecast's API is more oriented towards batch processing, which can be advantageous for large-scale forecasting tasks. However, pmdarima's syntax might be more intuitive for users familiar with traditional statistical modeling approaches.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pmdarima

PyPI version CircleCI Github Actions Status codecov Supported versions Downloads Downloads/Week

Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time series analysis capabilities. This includes:

  • The equivalent of R's auto.arima functionality
  • A collection of statistical tests of stationarity and seasonality
  • Time series utilities, such as differencing and inverse differencing
  • Numerous endogenous and exogenous transformers and featurizers, including Box-Cox and Fourier transformations
  • Seasonal time series decompositions
  • Cross-validation utilities
  • A rich collection of built-in time series datasets for prototyping and examples
  • Scikit-learn-esque pipelines to consolidate your estimators and promote productionization

Pmdarima wraps statsmodels under the hood, but is designed with an interface that's familiar to users coming from a scikit-learn background.

Installation

pip

Pmdarima has binary and source distributions for Windows, Mac and Linux (manylinux) on pypi under the package name pmdarima and can be downloaded via pip:

pip install pmdarima

conda

Pmdarima also has Mac and Linux builds available via conda and can be installed like so:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install pmdarima

Note: We do not maintain our own Conda binaries, they are maintained at https://github.com/conda-forge/pmdarima-feedstock. See that repo for further documentation on working with Pmdarima on Conda.

Quickstart Examples

Fitting a simple auto-ARIMA on the wineind dataset:

import pmdarima as pm
from pmdarima.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# Load/split your data
y = pm.datasets.load_wineind()
train, test = train_test_split(y, train_size=150)

# Fit your model
model = pm.auto_arima(train, seasonal=True, m=12)

# make your forecasts
forecasts = model.predict(test.shape[0])  # predict N steps into the future

# Visualize the forecasts (blue=train, green=forecasts)
x = np.arange(y.shape[0])
plt.plot(x[:150], train, c='blue')
plt.plot(x[150:], forecasts, c='green')
plt.show()
Wineind example

Fitting a more complex pipeline on the sunspots dataset, serializing it, and then loading it from disk to make predictions:

import pmdarima as pm
from pmdarima.model_selection import train_test_split
from pmdarima.pipeline import Pipeline
from pmdarima.preprocessing import BoxCoxEndogTransformer
import pickle

# Load/split your data
y = pm.datasets.load_sunspots()
train, test = train_test_split(y, train_size=2700)

# Define and fit your pipeline
pipeline = Pipeline([
    ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)),  # lmbda2 avoids negative values
    ('arima', pm.AutoARIMA(seasonal=True, m=12,
                           suppress_warnings=True,
                           trace=True))
])

pipeline.fit(train)

# Serialize your model just like you would in scikit:
with open('model.pkl', 'wb') as pkl:
    pickle.dump(pipeline, pkl)
    
# Load it and make predictions seamlessly:
with open('model.pkl', 'rb') as pkl:
    mod = pickle.load(pkl)
    print(mod.predict(15))
# [25.20580375 25.05573898 24.4263037  23.56766793 22.67463049 21.82231043
# 21.04061069 20.33693017 19.70906027 19.1509862  18.6555793  18.21577243
# 17.8250318  17.47750614 17.16803394]

Availability

pmdarima is available on PyPi in pre-built Wheel files for Python 3.9+ for the following platforms:

  • Mac (64-bit)
  • Linux (64-bit manylinux)
  • Windows (64-bit)
    • 32-bit wheels are available for pmdarima versions below 2.0.0 and Python versions below 3.10

If a wheel doesn't exist for your platform, you can still pip install and it will build from the source distribution tarball, however you'll need cython>=0.29 and gcc (Mac/Linux) or MinGW (Windows) in order to build the package from source.

Note that legacy versions (<1.0.0) are available under the name "pyramid-arima" and can be pip installed via:

# Legacy warning:
$ pip install pyramid-arima
# python -c 'import pyramid;'

However, this is not recommended.

Documentation

All of your questions and more (including examples and guides) can be answered by the pmdarima documentation. If not, always feel free to file an issue.