简体   繁体   English

熊猫检索时间序列的频率

[英]pandas retrieve the frequency of a time series

Is there a way to retrieve the frequency of a time series in pandas?有没有办法在熊猫中检索时间序列的频率?

rng = date_range('1/1/2011', periods=72, freq='H')
ts =pd.Series(np.random.randn(len(rng)), index=rng)

ts.frequency or ts.period are not methods available. ts.frequency 或 ts.period 不是可用的方法。

Thanks谢谢

Edit: Can we infer the frequency from time series that do not specify frequency?编辑:我们可以从没有指定频率的时间序列中推断出频率吗?

import pandas.io.data as web
aapl = web.get_data_yahoo("AAPL")

<class 'pandas.tseries.index.DatetimeIndex'>
[2010-01-04 00:00:00, ..., 2013-12-19 00:00:00]
Length: 999, Freq: None, Timezone: None

Can we somehow can the aapl's frequency?我们可以以某种方式获得 aapl 的频率吗? As we know, it's business days.众所周知,现在是工作日。

To infer the frequency, just use the built-in fct 'infer_freq'要推断频率,只需使用内置 fct 'infer_freq'

import pandas as pd
pd.infer_freq(ts.index)

For DatetimeIndex对于DatetimeIndex

>>> rng
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-01-03 23:00:00]
Length: 72, Freq: H, Timezone: None
>>> len(rng)
72
>>> rng.freq
<1 Hour>
>>> rng.freqstr
'H'

Similary for series indexed with this index与此索引索引的系列类似

>>> ts.index.freq
<1 Hour>

@sweetdream 's answer is pretty good actually, because frequency of the data is not always kept as an attribute to the index, so this won't work if it isn't specified: @sweetdream 的回答实际上非常好,因为数据的频率并不总是作为索引的属性保留,所以如果没有指定,这将不起作用:

df.index.freq

@sweetdream mentioned the infer_freq solution, which leads to another day that I'm again amazed by Pandas, that infers the frequency by looking at the index. @sweetdream 提到了 infer_freq 解决方案,这让我再次对 Pandas 感到惊讶,它通过查看索引来推断频率。 But sometimes it doesn't work, and there are another way of finding.但有时它不起作用,还有另一种查找方式。

Both should work:两者都应该工作:

text_freq_of_hourly_data_infer_freq = pd.infer_freq(df.index)
text_freq_of_hourly_data_inferred_freq = df.index.inferred_freq

They should both return 'H' , but if dataframe is not sorted, it will fail on inferring and it will return None as it is stated on documentation.它们都应该返回'H' ,但如果数据框未排序,则推断将失败,并且将返回None如文档中所述。 So you should sort the index.所以你应该对索引进行排序。

And don't forget to give "index" to it, not the dataframe, it can infer from the column instead of index if it's specified, again documentation tells, in the index.并且不要忘记为它提供“索引”,而不是数据框,如果在索引中指定了它,它可以从列而不是索引中推断出来。

If passed a Series will use the values of the series (NOT THE INDEX).如果通过,则系列将使用系列的值(不是索引)。

References:参考:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.inferred_freq.html?highlight=infer_freq https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.inferred_freq.html?highlight=infer_freq

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.infer_freq.html?highlight=infer_freq https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.infer_freq.html?highlight=infer_freq

If your index is datetime64 but it has no frequency associated, None is returned when using the above mentioned methods.如果您的索引是datetime64但它没有关联的频率,则使用上述方法时不会返回 None 。

I propose a rudimentary methodology for just aproximate the index frequency:我提出了一种用于近似索引频率的基本方法:

Being ts a pandas.Series:成为熊猫。系列:

abs(np.diff(ts)).mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM