簡體   English   中英

Statsmodels ARIMA 日期索引頻率

[英]Statsmodels ARIMA date index frequency

我有一個帶有日期時間索引的 pandas dataframe,頻率設置為“C” - 業務自定義:

ipdb>  data.index
DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
               '2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
               '2021-03-17', '2021-03-18',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
               '2021-11-18', '2021-11-19'],
              dtype='datetime64[ns]', name='mktDates', length=180, freq='C')

該索引是使用 pandas bdate_range function 創建的

holidays = pd.read_csv('../data/raw/market_holidays.csv', parse_dates=True, infer_datetime_format=True)
holidays = pd.to_datetime(holidays['date_YYYY_MM_DD'], format='%Y-%m-%d')

sttDate = dat.datetime(2013, 1, 1)
stpDate = dat.datetime(2021, 12, 31)

# build the calendar
mktCalendar = pd.bdate_range(start=sttDate, end=stpDate, holidays=holidays.values, freq='C').rename('mktDates')

我正在嘗試使用以下代碼將 ARIMA model 與 statsmodels 匹配:

import statsmodels.api as sm
thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')

最后一行拋出異常:

<ipython-input-392-acbc7f25591c> in ARIMASimulate(data, simParams, randSeed, verbose)
     27         # fit and get the score
     28         ipdb.set_trace()
---> 29         arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\model.py in __init__(self, endog, exog, order, seasonal_order, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
    107     >>> print(res.summary())
    108     """
--> 109     def __init__(self, endog, exog=None, order=(0, 0, 0),
    110                  seasonal_order=(0, 0, 0, 0), trend=None,
    111                  enforce_stationarity=True, enforce_invertibility=True,

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\specification.py in __init__(self, endog, exog, order, seasonal_order, ar_order, diff, ma_order, seasonal_ar_order, seasonal_diff, seasonal_ma_order, seasonal_periods, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
    444         # especially validating shapes, retrieving names, and potentially
    445         # providing us with a time series index
--> 446         self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=freq,
    447                                       missing=missing)
    448         self.endog = None if faux_endog else self._model.endog

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in __init__(self, endog, exog, dates, freq, missing, **kwargs)
    413 
    414         # Date handling in indexes
--> 415         self._init_dates(dates, freq)
    416 
    417     def _init_dates(self, dates=None, freq=None):

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in _init_dates(self, dates, freq)
    555                 elif (freq is not None and not inferred_freq and
    556                         not (index.freq == freq)):
--> 557                     raise ValueError('The given frequency argument is'
    558                                      ' incompatible with the given index.')
    559             # Finally, raise an exception if we could not coerce to date-based

ValueError: The given frequency argument is incompatible with the given index

我不明白這一點,因為頻率參數與數據索引的參數相同。 我也知道索引沒有按照頻率丟失任何日期。 我有 statsmodels 0.12.1。 知道這里發生了什么嗎?

嘗試從 2021-03-05 到 2021-11-19 生成一個帶有freq='C'的 DateTimeIndex,長度為186 您的索引為180 ,因此缺少 6 個日期

import pandas as pd

date_range = pd.date_range(
    start='2021-03-05',
    end='2021-11-19',
    freq='C'
)

print(date_range)

DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
               '2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
               '2021-03-17', '2021-03-18',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
               '2021-11-18', '2021-11-19'],
              dtype='datetime64[ns]', length=186, freq='C')

將此date_range與 ARIMA 一起使用,不會出現錯誤

import numpy as np
import statsmodels.api as sm

x = np.linspace(0, 2*np.pi, date_range.size)
y = np.sin(4*np.pi*x)

data = pd.DataFrame({
    'Y': y,
}, index=date_range)

thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(
    endog=data, order=thisOrder, 
    freq='C'
)

所以你可能需要檢查你的 DataFrame 索引。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM