简体   繁体   English

熊猫重新采样到现有指数

[英]Pandas resampling to the existing index

I have a long timeseries that ends with the following data. 我有很长的时间序列,以下列数据结束。

               ABC     CDE
Date                      
2017-05-26  107.00  241.71
2017-05-30  107.27  241.50
2017-05-31  107.32  241.44
2017-06-01  107.10  243.36
2017-06-02  107.57  244.17

I would like to resample it so that it becomes monthly data but I am interested in retaining the actual last monthly dates in the time series. 我想重新对它进行重新抽样,使其成为月度数据,但我有兴趣保留时间序列中的实际上个月日期。 If I do, 如果我做,

df.resample('BM').last()

gives the following tail-end output 给出以下尾端输出

2017-05-31  107.32  241.44 
2017-06-30  107.57  244.17

which does not give the correct last date of the dataframe. 它没有给出数据帧的正确的最后日期。 There are other dates within the resampled dataframe that are off as well. 重采样数据帧中还有其他日期也已关闭。 Essentially Pandas isn't using the existing index to find the month end but it's own business day calendar. 基本上Pandas没有使用现有的索引来查找月末,但它是自己的工作日日历。

Is there an option I can feed into the Pandas resampling function so that it uses the existing index to achieve the desired result which is 有没有我可以提供给Pandas重采样功能的选项,以便它使用现有索引来实现所需的结果,即

2017-05-31  107.32  241.44 
2017-06-02  107.57  244.17

You need create new column from index and last set_index : 您需要从索引和最后一个set_index创建新列:

df = df.assign(Date=df.index).resample('BM').last().set_index('Date')
print (df)
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

But if need resample by month period only: 但如果只需要按月份重新采样:

m = df.index.to_period('m')
df = df.reset_index().groupby(m).last().set_index('Date')
print (df)
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

You can drop duplicates based on year and month and only keep the last row. 您可以根据年份和月份删除重复项,并仅保留最后一行。

df.assign(m=df.index.to_period('m')).drop_duplicates('m','last').drop('m',1)
Out[728]: 
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

Or you can use group by year and month and then take the last row from each group. 或者您可以按年份和月份使用组,然后从每个组中选择最后一行。

df.reset_index()\
  .sort_values('Date')\
  .assign(m=df.index.to_period('m'))\
  .groupby(by='m')\
  .last()\
  .set_index('Date')
Out[677]: 
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM