熊猫数据框：找到满足各行不同条件的所有观测值的条件均值

Question

Let's say that I have a dataframe like this: 假设我有一个像这样的数据框：

            date   M1_start     M1_end  SimPrices_t0_exp
    0 2017-12-31 2018-01-01 2018-01-31         16.151667
    1 2018-01-01 2018-02-01 2018-02-28         45.138445
    2 2018-01-02 2018-02-01 2018-02-28         56.442648
    3 2018-01-03 2018-02-01 2018-02-28         59.769931
    4 2018-01-04 2018-02-01 2018-02-28         50.171695

And I want to get the mean of SimPrices_t0_exp observations whose value of 'date' are between the M1_start and M1_end for every observation 我想获得SimPrices_t0_exp观测值的平均值，对于每个观测值，其“日期”的值都在M1_start和M1_end之间

I have tried this 我已经试过了

    mask = ((df['date'] >= df['M1_start']) & (df['date'] <= df['M1_end']))
    df['mymean'] = df['SimPrices_t0_exp'][mask].mean()

How ever this returns NaN for every observation, I believe because the mask is applied for each row individually checking the mask conditions for its own date which will never return true. 我相信每次观察都会返回NaN的原因，我相信是因为对每一行都应用了掩码，因此需要逐一检查掩码条件以获取其自己的日期（永远不会返回true）。

Can somebody help me? 有人可以帮我吗？ I have been struggling with this problem for two days 我已经为这个问题苦苦挣扎了两天

Example: for the first observation, the resulting column would have on its first observation the average of 45.13,56.44,59.76,50.17 in this particular case 示例：对于第一个观察，在此特定情况下，结果列在其第一个观察中将具有平均45.13,56.44,59.76,50.17

if it helps somebody, the pseudocode would be something like this: 如果对某人有帮助，则伪代码将如下所示：

for obs in observations:
   start = obs.start
   end = obs.end
   sum = 0
   obs_count = 0
   for obs2 in observations:
      if obs2.date >= start and obs2.date <= end:
         sum += obs.SimPrices_t0_exp
         obs_count += 1
   obs.mean = sum/obs_count

Thanks!! 谢谢！！

Answer 1

Here, one way to do this using cartesian merging (not a good choice for large dataset), filtering and groupby : 在这里，一种使用笛卡尔合并（对于大型数据集不是一个好的选择），过滤和groupby方法来做到这一点：

df = df.assign(key=1)
df_m = df.merge(df, on='key')

df_m.query('M1_start_x <= date_y <= M1_end_x').groupby(['M1_start_x','M1_end_x'])['SimPrices_t0_exp_y'].mean()

Output: 输出：

M1_start_x  M1_end_x  
2018-01-01  2018-01-31    52.88068
Name: SimPrices_t0_exp_y, dtype: float64

熊猫数据框：找到满足各行不同条件的所有观测值的条件均值

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-05-14 18:31:18

熊猫数据框：找到满足各行不同条件的所有观测值的条件均值

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-05-14 18:31:18

解决方案1
0 已采纳 2018-05-14 18:31:18