简体   繁体   中英

Select first/15th day of month plus the day before and after in python

I would like to select specific days in a table to calculate the mean for each specific group. My table has around 9000 lines like these: Example Data

I would like to select only one value for every -first value of a month, -last value of a month, -second value of a month, -every 15th, -the day before the 15th, -the day after 15th

The purpose is to calculate the mean for every specific group.

The result should look like this: Result

I am struggeling with the calculation for the 15th/before/after as well as "after the first".

What I tried so far is:

import pandas as pd
df = pd.read_csv
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

"Average first of month"
dffirst = df[~df.index.to_period('m').duplicated()]
monthly_first = dffirst['Value'].mean()

"Average last of month"
dflast = df.resample("M").max()
monthly_last = dflast['Value'].mean()

Thank you

As far as I understand, some dates could be missing, which makes it a little more complicated.

What I would do is to track the indices of the first/last dates available in a month and go from there. Ie first indices +1 to get the second, first indices +14 to get the 15th available date. The calculation of the average value is then straightforward.

However, you have to make sure that the shifted indices exist (eg no negative index, no index exceeding the length of your dataframe).

For the code below, I assume that the dates are in the index column.

# get indices of first dates available
# get indices of beginning of month as list: df.resample("MS").mean().index.tolist()
# list comprehension to get the index of the next value available (method="bfill") in the dataframe
indices_first = np.asarray([df.index.get_loc(d, method="bfill") for d in df.resample("MS").mean().index.tolist()])

# get indices of last dates available
# method is here "ffill" and resample("M")
indices_last = np.asarray([df.index.get_loc(d, method="ffill") for d in df.resample("M").mean().index.tolist()])

# to get indices of 15th dates available
indices_15 = indices_first + 14
indices_15 = indices_15[indices_15 < len(df)]

# to get indices before last
indices_before_last = indices_last - 1
indices_before_last = indices_before_last[indices_15 >= 0]

You can then access the corresponding rows of your dataframe:

avg_first = df.iloc[indices_first]['Value'].mean()
avg_15th = df.iloc[indices_15]['Value'].mean()
avg_before_last = df.iloc[indices_before_last]['Value'].mean()
avg_last = df.iloc[indices_last]['Value'].mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM