简体   繁体   English

Select 每月的第一天/第十五天加上 python 的前后一天

[英]Select first/15th day of month plus the day before and after in python

I would like to select specific days in a table to calculate the mean for each specific group.我想 select 表格中的特定天数来计算每个特定组的平均值。 My table has around 9000 lines like these: Example Data我的表有大约 9000 行,如下所示:示例数据

I would like to select only one value for every -first value of a month, -last value of a month, -second value of a month, -every 15th, -the day before the 15th, -the day after 15th我想 select 每个月的第一个值,一个月的最后一个值,一个月的第二个值,-每 15 日,-15 日的前一天,-15 日后的一天只有一个值

The purpose is to calculate the mean for every specific group.目的是计算每个特定组的平均值。

The result should look like this: Result结果应如下所示:结果

I am struggeling with the calculation for the 15th/before/after as well as "after the first".我正在努力计算第 15 次/之前/之后以及“第一次之后”。

What I tried so far is:到目前为止我尝试的是:

import pandas as pd
df = pd.read_csv
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

"Average first of month"
dffirst = df[~df.index.to_period('m').duplicated()]
monthly_first = dffirst['Value'].mean()

"Average last of month"
dflast = df.resample("M").max()
monthly_last = dflast['Value'].mean()

Thank you谢谢

As far as I understand, some dates could be missing, which makes it a little more complicated.据我了解,一些日期可能会丢失,这使得它有点复杂。

What I would do is to track the indices of the first/last dates available in a month and go from there.我要做的是从那里跟踪一个月内可用的第一个/最后一个日期的索引以及 go 的索引。 Ie first indices +1 to get the second, first indices +14 to get the 15th available date.即第一个索引 +1 获得第二个,第一个索引 +14 获得第 15 个可用日期。 The calculation of the average value is then straightforward.那么平均值的计算就很简单了。

However, you have to make sure that the shifted indices exist (eg no negative index, no index exceeding the length of your dataframe).但是,您必须确保存在移位索引(例如,没有负索引,没有超过数据帧长度的索引)。

For the code below, I assume that the dates are in the index column.对于下面的代码,我假设日期在索引列中。

# get indices of first dates available
# get indices of beginning of month as list: df.resample("MS").mean().index.tolist()
# list comprehension to get the index of the next value available (method="bfill") in the dataframe
indices_first = np.asarray([df.index.get_loc(d, method="bfill") for d in df.resample("MS").mean().index.tolist()])

# get indices of last dates available
# method is here "ffill" and resample("M")
indices_last = np.asarray([df.index.get_loc(d, method="ffill") for d in df.resample("M").mean().index.tolist()])

# to get indices of 15th dates available
indices_15 = indices_first + 14
indices_15 = indices_15[indices_15 < len(df)]

# to get indices before last
indices_before_last = indices_last - 1
indices_before_last = indices_before_last[indices_15 >= 0]

You can then access the corresponding rows of your dataframe:然后,您可以访问 dataframe 的相应行:

avg_first = df.iloc[indices_first]['Value'].mean()
avg_15th = df.iloc[indices_15]['Value'].mean()
avg_before_last = df.iloc[indices_before_last]['Value'].mean()
avg_last = df.iloc[indices_last]['Value'].mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM