Select first/15th day of month plus the day before and after in python

Question

I would like to select specific days in a table to calculate the mean for each specific group. My table has around 9000 lines like these: Example Data

I would like to select only one value for every -first value of a month, -last value of a month, -second value of a month, -every 15th, -the day before the 15th, -the day after 15th

The purpose is to calculate the mean for every specific group.

The result should look like this: Result

I am struggeling with the calculation for the 15th/before/after as well as "after the first".

What I tried so far is:

import pandas as pd
df = pd.read_csv
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

"Average first of month"
dffirst = df[~df.index.to_period('m').duplicated()]
monthly_first = dffirst['Value'].mean()

"Average last of month"
dflast = df.resample("M").max()
monthly_last = dflast['Value'].mean()

Thank you

Answer 1

As far as I understand, some dates could be missing, which makes it a little more complicated.

What I would do is to track the indices of the first/last dates available in a month and go from there. Ie first indices +1 to get the second, first indices +14 to get the 15th available date. The calculation of the average value is then straightforward.

However, you have to make sure that the shifted indices exist (eg no negative index, no index exceeding the length of your dataframe).

For the code below, I assume that the dates are in the index column.

# get indices of first dates available
# get indices of beginning of month as list: df.resample("MS").mean().index.tolist()
# list comprehension to get the index of the next value available (method="bfill") in the dataframe
indices_first = np.asarray([df.index.get_loc(d, method="bfill") for d in df.resample("MS").mean().index.tolist()])

# get indices of last dates available
# method is here "ffill" and resample("M")
indices_last = np.asarray([df.index.get_loc(d, method="ffill") for d in df.resample("M").mean().index.tolist()])

# to get indices of 15th dates available
indices_15 = indices_first + 14
indices_15 = indices_15[indices_15 < len(df)]

# to get indices before last
indices_before_last = indices_last - 1
indices_before_last = indices_before_last[indices_15 >= 0]

You can then access the corresponding rows of your dataframe:

avg_first = df.iloc[indices_first]['Value'].mean()
avg_15th = df.iloc[indices_15]['Value'].mean()
avg_before_last = df.iloc[indices_before_last]['Value'].mean()
avg_last = df.iloc[indices_last]['Value'].mean()

Select first/15th day of month plus the day before and after in python

Question

1 answers

solution1
0 2021-04-07 15:50:06

Select first/15th day of month plus the day before and after in python

Question

1 answers

solution1 0 2021-04-07 15:50:06

solution1
0 2021-04-07 15:50:06