简体   繁体   English

Statsmodel使用ARMA

[英]Statsmodel using ARMA

A bit new here but trying to get a statsmodel ARMA prediction tool to work. 这里有点新,但试图使用statsmodel ARMA预测工具。 I've imported some stock data from Yahoo and gotten the ARMA to give me fitting parameters. 我从雅虎导入了一些股票数据并得到ARMA给我适合的参数。 However when I use the predict code all I receive is a list of errors that I don't seem to be able to figure out. 但是,当我使用预测代码时,我收到的是一个错误列表,我似乎无法弄清楚。 Not quite sure what I'm doing wrong here: 不太确定我在这里做错了什么:

import pandas
import statsmodels.tsa.api as tsa
from pandas.io.data import DataReader

start = pandas.datetime(2013,1,1)
end = pandas.datetime.today()

data = DataReader('GOOG','yahoo')
arma =tsa.ARMA(data['Close'], order =(2,2))
results= arma.fit()
results.predict(start=start,end=end)

The errors are: 错误是:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Windows\system32\<ipython-input-84-25a9b6bc631d> in <module>()
     13 results= arma.fit()
     14 results.summary()
---> 15 results.predict(start=start,end=end)

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\base\wrapp
er.pyc in wrapper(self, *args, **kwargs)
     88         results = object.__getattribute__(self, '_results')
     89         data = results.model.data
---> 90         return data.wrap_output(func(results, *args, **kwargs), how)
     91
     92     argspec = inspect.getargspec(func)

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in predict(self, start, end, exog, dynamic)
   1265
   1266         """
-> 1267         return self.model.predict(self.params, start, end, exog, dynamic
)
   1268
   1269     def forecast(self, steps=1, exog=None, alpha=.05):

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in predict(self, params, start, end, exog, dynamic)
    497
    498         # will return an index of a date

--> 499         start = self._get_predict_start(start, dynamic)
    500         end, out_of_sample = self._get_predict_end(end, dynamic)
    501         if out_of_sample and (exog is None and self.k_exog > 0):

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in _get_predict_start(self, start, dynamic)
    404             #elif 'mle' not in method or dynamic: # should be on a date

    405             start = _validate(start, k_ar, k_diff, self.data.dates,
--> 406                               method)
    407             start = super(ARMA, self)._get_predict_start(start)
    408         _check_arima_start(start, k_ar, k_diff, method, dynamic)

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in _validate(start, k_ar, k_diff, dates, method)
    160     if isinstance(start, (basestring, datetime)):
    161         start_date = start
--> 162         start = _index_date(start, dates)
    163         start -= k_diff
    164     if 'mle' not in method and start < k_ar - k_diff:

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in _index_date(date, dates)
     37         freq = _infer_freq(dates)
     38         # we can start prediction at the end of endog

---> 39         if _idx_from_dates(dates[-1], date, freq) == 1:
     40             return len(dates)
     41

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in _idx_from_dates(d1, d2, freq)
     70         from pandas import DatetimeIndex
     71         return len(DatetimeIndex(start=d1, end=d2,
---> 72                                  freq = _freq_to_pandas[freq])) - 1
     73     except ImportError, err:
     74         from pandas import DateRange

D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in __getitem__(self, key)
     11         # being lazy, don't want to replace dictionary below

     12         def __getitem__(self, key):
---> 13             return get_offset(key)
     14     _freq_to_pandas = _freq_to_pandas_class()
     15 except ImportError, err:

D:\Python27\lib\site-packages\pandas\tseries\frequencies.pyc in get_offset(name)

    484     """
    485     if name not in _dont_uppercase:
--> 486         name = name.upper()
    487
    488         if name in _rule_aliases:

AttributeError: 'NoneType' object has no attribute 'upper'

Looks like a bug to me. 对我来说看起来像个错误。 I'll look into it. 我会调查一下。

https://github.com/statsmodels/statsmodels/issues/712 https://github.com/statsmodels/statsmodels/issues/712

Edit : As a workaround, you can just drop the DatetimeIndex from the DataFrame and pass it the numpy array. 编辑 :作为一种解决方法,您可以从DataFrame中删除DatetimeIndex并将其传递给numpy数组。 It makes prediction a little trickier date-wise, but it's already pretty tricky to use dates for prediction when there is no frequency, so just having the starting and ending dates is essentially meaningless. 它使得预测在日期方面变得有点棘手,但是当没有频率时使用日期进行预测已经相当棘手,因此只有开始和结束日期基本上没有意义。

import pandas
import statsmodels.tsa.api as tsa
from pandas.io.data import DataReader
import pandas

data = DataReader('GOOG','yahoo')
dates = data.index

# start at a date on the index
start = dates.get_loc(pandas.datetools.parse("1-2-2013"))
end = start + 30 # "steps"

# NOTE THE .values
arma =tsa.ARMA(data['Close'].values, order =(2,2))
results= arma.fit()
results.predict(start, end)

When I run your code, I get: 当我运行你的代码时,我得到:

"ValueError: There is no frequency for these dates and date 2013-01-01 00:00:00 is not in dates index. Try giving a date that is in the dates index or use an integer" “ValueError:这些日期没有频率,日期2013-01-01 00:00:00不在日期索引中。请尝试给出日期索引中的日期或使用整数”

Since trading dates are happen at uneven frequency (holidays and weekends), the model is not smart enough to know the correct frequency for calculations. 由于交易日期是在不均匀的频率(假日和周末)发生的,因此模型不够智能,无法知道正确的计算频率。

If you replace the dates with their integer location in the index, then you get your predictions. 如果将日期替换为索引中的整数位置,则可以获得预测。 Then you can simply put the original index back on the results. 然后,您可以简单地将原始索引放回结果上。

prediction = results.predict(start=0, end=len(data) - 1)
prediction.index = data.index
print(prediction)

2010-01-04    689.507451
2010-01-05    627.085986
2010-01-06    624.256331
2010-01-07    608.133481
...
2017-05-09    933.700555
2017-05-10    931.290023
2017-05-11    927.781427
2017-05-12    929.661014

As an aside, you may want to run a model like this on the daily returns rather than on the raw prices. 另外,您可能希望在每日退货而不是原始价格上运行这样的模型。 Running it on the raw prices isn't going to capture momentum and mean reversion like you probably think it would. 以原始价格运行它并不会像你想象的那样捕捉动力并意味着回归。 Your model is being built off the absolute values of the prices, not on the change in prices, momentum, moving average, etc. other factors you probably want to be using. 您的模型是建立在价格的绝对值之上,而不是价格,动量,移动平均线等的变化,您可能想要使用的其他因素。 The predictions you're creating will look pretty good because they're only predicting one step ahead, so it doesn't capture the compounding error. 您正在创建的预测看起来非常好,因为它们只预测前一步,因此它不会捕获复合错误。 This confuses a lot of people. 这让很多人感到困惑。 The errors will look small relative to the absolute value of the stock price, but the model won't be very predictive. 相对于股票价格的绝对值,错误看起来很小,但模型不会很具预测性。

I'd suggest reading through this walkthrough for a starter: 我建议阅读本演练,了解一个入门者:

http://www.johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/ http://www.johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM