Py之pmdarima：pmdarima库的简介、安装、使用方法之详细攻略

2023-05-27 01:06| 来源: 网络整理| 查看: 265

pmdarima库的简介

pmdarima库的安装

pmdarima库的使用方法

1、基础用法

(1)、在wineind数据集上拟合简单的auto-ARIMA模型

(2)、在sunspots数据集上拟合更复杂的流水线模型，将其序列化，然后从磁盘加载以进行预测

2、进阶用法

(1)、加载 lynx 数据集使用 auto_arima 函数拟合一个 ARIMA 模型对未来 10 个时间步长进行预测

pmdarima库的简介

Pmdarima是一个统计库，旨在填补Python时间序列分析能力的空白。Pmdarima在底层使用statsmodels，但其接口设计对于来自scikit-learn背景的用户来说是熟悉的。 Pmdarima（又称 pyramid-arima）是一个用于自动化 ARIMA 模型拟合的 Python 库。ARIMA（自回归综合移动平均模型）是一种常用的时间序列模型，用于分析和预测时间序列数据。Pmdarima 提供了一个自动化的方法来选择 ARIMA 模型的参数，包括 AR（自回归）和 MA（移动平均）阶数和差分阶数，从而减少了使用 ARIMA 模型时的一些繁琐工作。它基于类似于网格搜索的算法，从所有可能的 ARIMA 模型中选择最佳模型，使得该模型对于数据的拟合效果最好。其包括以下功能： >> 类似于R中auto.arima的功能 >> 一组检验平稳性和季节性的统计测试 >> 时间序列实用工具，如差分和反差分 >> 多种内生和外生转换器和特征工程器，包括Box-Cox和Fourier变换 >> 季节性时间序列分解 >> 交叉验证工具 >> 丰富的内置时间序列数据集，用于原型设计和示例 >> 类似于scikit-learn的流水线，以整合估计器并促进生产化。除了自动化 ARIMA 参数选择之外，Pmdarima 还提供了许多有用的功能，如： >> 支持 exogenous 变量的 ARIMA 模型拟合。 >> 支持用于数据预处理和特征工程的数据变换器。 >> 支持交叉验证和超参数优化。

总之，Pmdarima 是一个方便易用的 Python 库，可用于时间序列数据的分析和预测。它通过自动化参数选择和其他功能简化了 ARIMA 模型的使用。

GitHub链接：GitHub - alkaline-ml/pmdarima: A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima库的安装 pip install pmdarima pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pmdarima

pmdarima库的使用方法 1、基础用法 (1)、在wineind数据集上拟合简单的auto-ARIMA模型 import pmdarima as pm from pmdarima.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt # Load/split your data y = pm.datasets.load_wineind() train, test = train_test_split(y, train_size=150) # Fit your model model = pm.auto_arima(train, seasonal=True, m=12) # make your forecasts forecasts = model.predict(test.shape[0]) # predict N steps into the future # Visualize the forecasts (blue=train, green=forecasts) x = np.arange(y.shape[0]) plt.plot(x[:150], train, c='blue') plt.plot(x[150:], forecasts, c='green') plt.show()

(2)、在sunspots数据集上拟合更复杂的流水线模型，将其序列化，然后从磁盘加载以进行预测 import pmdarima as pm from pmdarima.model_selection import train_test_split from pmdarima.pipeline import Pipeline from pmdarima.preprocessing import BoxCoxEndogTransformer import pickle # Load/split your data y = pm.datasets.load_sunspots() train, test = train_test_split(y, train_size=2700) # Define and fit your pipeline pipeline = Pipeline([ ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)), # lmbda2 avoids negative values ('arima', pm.AutoARIMA(seasonal=True, m=12, suppress_warnings=True, trace=True)) ]) pipeline.fit(train) # Serialize your model just like you would in scikit: with open('model.pkl', 'wb') as pkl: pickle.dump(pipeline, pkl) # Load it and make predictions seamlessly: with open('model.pkl', 'rb') as pkl: mod = pickle.load(pkl) print(mod.predict(15)) # [25.20580375 25.05573898 24.4263037 23.56766793 22.67463049 21.82231043 # 21.04061069 20.33693017 19.70906027 19.1509862 18.6555793 18.21577243 # 17.8250318 17.47750614 17.16803394]

2、进阶用法 (1)、加载 lynx 数据集使用 auto_arima 函数拟合一个 ARIMA 模型对未来 10 个时间步长进行预测 import pmdarima as pm from pmdarima import model_selection import matplotlib.pyplot as plt import numpy as np # 加载数据并将其拆分为单独的部分 data = pm.datasets.load_lynx() train, test = model_selection.train_test_split(data, train_size=100) # fit一些验证(cv)样本 arima = pm.auto_arima(train, start_p=1, start_q=1, d=0, max_p=5, max_q=5, out_of_sample_size=10, suppress_warnings=True, stepwise=True, error_action='ignore') # 现在绘制测试集的结果和预测 preds, conf_int = arima.predict(n_periods=test.shape[0], return_conf_int=True) fig, axes = plt.subplots(2, 1, figsize=(12, 8)) x_axis = np.arange(train.shape[0] + preds.shape[0]) axes[0].plot(x_axis[:train.shape[0]], train, alpha=0.75) axes[0].scatter(x_axis[train.shape[0]:], preds, alpha=0.4, marker='o') axes[0].scatter(x_axis[train.shape[0]:], test, alpha=0.4, marker='x') axes[0].fill_between(x_axis[-preds.shape[0]:], conf_int[:, 0], conf_int[:, 1], alpha=0.1, color='b') # 填写在模型中"held out"样本的部分 axes[0].set_title("Train samples & forecasted test samples") # 现在将实际样本添加到模型中并创建NEW预测 arima.update(test) new_preds, new_conf_int = arima.predict(n_periods=10, return_conf_int=True) new_x_axis = np.arange(data.shape[0] + 10) axes[1].plot(new_x_axis[:data.shape[0]], data, alpha=0.75) axes[1].scatter(new_x_axis[data.shape[0]:], new_preds, alpha=0.4, marker='o') axes[1].fill_between(new_x_axis[-new_preds.shape[0]:], new_conf_int[:, 0], new_conf_int[:, 1], alpha=0.1, color='g') axes[1].set_title("Added new observed values with new forecasts") plt.show()

【本文地址】

公司简介

联系我们