简体   繁体   English

在熊猫中查找一个月的第一个和最后一个可用天数

[英]finding first and last available days of a month in pandas

I have a pandas dataframe from 2007 to 2017. The data is like this:我有一个从 2007 年到 2017 年的熊猫数据框。数据是这样的:

date      closing_price
2007-12-03  728.73
2007-12-04  728.83
2007-12-05  728.83
2007-12-07  728.93
2007-12-10  728.22
2007-12-11  728.50
2007-12-12  728.51
2007-12-13  728.65
2007-12-14  728.65
2007-12-17  728.70
2007-12-18  728.73
2007-12-19  728.73
2007-12-20  728.73
2007-12-21  728.52
2007-12-24  728.52
2007-12-26  728.90
2007-12-27  728.90
2007-12-28  728.91
2008-01-05  728.88
2008-01-08  728.86
2008-01-09  728.84
2008-01-10  728.85
2008-01-11  728.85
2008-01-15  728.86
2008-01-16  728.89

As you can see, some days are missing for each month.如您所见,每个月都缺少一些日子。 I want to take the first and last 'available' days of each month, and calculate the difference of their closing_price, and put the results in a new dataframe.我想取每个月的第一个和最后一个“可用”天数,并计算它们的收盘价的差异,并将结果放入一个新的数据框中。 For example for the first month, the days will be 2007-12-03 and 2007-12-28, and the closing prices would be 728.73 and 728.91, so the result would be 0.18.例如,第一个月的天数为 2007-12-03 和 2007-12-28,收盘价为 728.73 和 728.91,因此结果为 0.18。 How can I do this?我怎样才能做到这一点?

you can group df by month and apply a function to do it.您可以按月对 df 进行分组并应用一个函数来完成它。 Notice the to_period , this function convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency.注意to_period ,此函数将 DataFrame 从 DatetimeIndex 转换为具有所需频率的 PeriodIndex 。

def calculate(x):
    start_closing_price = x.loc[x.index.min(), "closing_price"]
    end_closing_price = x.loc[x.index.max(), "closing_price"]
    return end_closing_price-start_closing_price

result = df.groupby(df["date"].dt.to_period("M")).apply(calculate)

# result
date
2007-12    0.18
2008-01    0.01
Freq: M, dtype: float64

First make sure they are datetime and sorted:首先确保它们是datetime并已排序:

import pandas as pd

df['date'] = pd.to_datetime(df.date)
df = df.sort_values('date')

Groupby通过...分组

gp = df.groupby([df.date.dt.year.rename('year'), df.date.dt.month.rename('month')])
gp.closing_price.last() - gp.closing_price.first()

#year  month
#2007  12       0.18
#2008  1        0.01
#Name: closing_price, dtype: float64

or或者

gp = df.groupby(pd.Grouper(key='date', freq='1M'))
gp.last() - gp.first()

#            closing_price
#date                     
#2007-12-31           0.18
#2008-01-31           0.01

Resample重新采样

gp = df.set_index('date').resample('1M')
gp.last() - gp.first()

#            closing_price
#date                     
#2007-12-31           0.18
#2008-01-31           0.01

Problem : Get first or last date of indexed dataframe问题:获取索引数据帧的第一个或最后一个日期

Solution : Resample the index and then extract the data.解决方案:重新采样索引,然后提取数据。

lom    = pd.Series(x.index, index = x.index).resample('m').last()
xlast  = x[x.index.isin(lom)] # .resample('m').last() to get monthly freq
fom    = pd.Series(x.index, index = x.index).resample('m').first()
xfirst = x[x.index.isin(fom)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM