Pandas Dataframe: Find the conditional mean of all observations that meet certain conditions that are DIFFERENT in each row

Question

Let's say that I have a dataframe like this:

            date   M1_start     M1_end  SimPrices_t0_exp
    0 2017-12-31 2018-01-01 2018-01-31         16.151667
    1 2018-01-01 2018-02-01 2018-02-28         45.138445
    2 2018-01-02 2018-02-01 2018-02-28         56.442648
    3 2018-01-03 2018-02-01 2018-02-28         59.769931
    4 2018-01-04 2018-02-01 2018-02-28         50.171695

And I want to get the mean of SimPrices_t0_exp observations whose value of 'date' are between the M1_start and M1_end for every observation

I have tried this

    mask = ((df['date'] >= df['M1_start']) & (df['date'] <= df['M1_end']))
    df['mymean'] = df['SimPrices_t0_exp'][mask].mean()

How ever this returns NaN for every observation, I believe because the mask is applied for each row individually checking the mask conditions for its own date which will never return true.

Can somebody help me? I have been struggling with this problem for two days

Example: for the first observation, the resulting column would have on its first observation the average of 45.13,56.44,59.76,50.17 in this particular case

if it helps somebody, the pseudocode would be something like this:

for obs in observations:
   start = obs.start
   end = obs.end
   sum = 0
   obs_count = 0
   for obs2 in observations:
      if obs2.date >= start and obs2.date <= end:
         sum += obs.SimPrices_t0_exp
         obs_count += 1
   obs.mean = sum/obs_count

Thanks!!

Answer 1

Here, one way to do this using cartesian merging (not a good choice for large dataset), filtering and groupby :

df = df.assign(key=1)
df_m = df.merge(df, on='key')

df_m.query('M1_start_x <= date_y <= M1_end_x').groupby(['M1_start_x','M1_end_x'])['SimPrices_t0_exp_y'].mean()

Output:

M1_start_x  M1_end_x  
2018-01-01  2018-01-31    52.88068
Name: SimPrices_t0_exp_y, dtype: float64

Pandas Dataframe: Find the conditional mean of all observations that meet certain conditions that are DIFFERENT in each row

Question

1 answers

solution1
0 ACCPTED 2018-05-14 18:31:18

Pandas Dataframe: Find the conditional mean of all observations that meet certain conditions that are DIFFERENT in each row

Question

1 answers

solution1 0 ACCPTED 2018-05-14 18:31:18

solution1
0 ACCPTED 2018-05-14 18:31:18