在 pandas dataframe 中添加一列，这是基于其他列条件的另一列的平均值

Question

Sorry in advance for the long data table.提前为长数据表道歉。 I do not know a more succinct way to construct the dataframe that I have below.我不知道构建下面的 dataframe 的更简洁的方法。

I have a pandas DataFrame:我有一个 pandas DataFrame：

data = {'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'Cycle': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
        'Repetition': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2'],
        'Region': ['x', 'x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'x','x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
        'Intensity': [34, 89, 34, 45, 34, 56, 78, 65, 45, 45, 34, 56, 34, 56, 56, 66, 56, 78, 23, 45, 42, 56, 86, 5, 33, 44, 78, 89, 34, 42, 34, 66]}


data_df= pd.DataFrame(data)

I would like to add a column that calculates the average intensity when Cycle == 1 for each ID (A and B) and each Region (x and y) and leaves NaN values in all other rows.我想添加一个列，用于计算每个 ID（A 和 B）和每个区域（x 和 y）的Cycle == 1时的平均强度，并在所有其他行中保留 NaN 值。 The resulting dataframe would return:生成的 dataframe 将返回：

wanted_data = {'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'Cycle': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
        'Repetition': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2'],
        'Region': ['x', 'x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'x','x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
        'Intensity': [34, 89, 34, 45, 34, 56, 78, 65, 45, 45, 34, 56, 34, 56, 56, 66, 56, 78, 23, 45, 42, 56, 86, 5, 33, 44, 78, 89, 34, 42, 34, 66],
        'Mean Cycle1 Intensity': [39.5, '', '', '', 34, '', '', '', '', '', '', '', '', '', '', '', 44.5, '', '', '', 38, '', '', '', '', '', '', '', '', '', '', ''] }

wanted_data_df= pd.DataFrame(wanted_data)

I tried adding a function:我尝试添加一个 function：

def meanC1(df):
    for i in df['ID'] and j in df['Region']:
        if df['Cycle'] == 1:
            df['Mean Cycle1 Intensity'] = df['Intensity'].mean()

But this returns,但这又回来了，

ValueError: The truth value of a Series is ambiguous. ValueError：Series 的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all()使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

Answer 1

Use Series.ne to create a boolean mask m , then use Series.mask to mask the Intensity column on m , next use Series.groupby to group the masked column on ID and Repetition and transform using mean , finally again use Series.mask to mask the transformed result:使用Series.ne创建一个boolean 掩码m ，然后使用Series.mask屏蔽m上的Intensity列，接下来使用Series.groupby对ID和Repetition上的屏蔽列进行分组，并使用mean进行transform ，最后再次使用Series.mask来掩盖转换后的结果：

# Note: Here df refers to `data_df`

m = df['Cycle'].ne(1)
df['Mean Cycle1 Intensity'] = (
    df['Intensity'].mask(m)
    .groupby([df['ID'], df['Repetition']]).transform('mean').mask(m)
)

Result:结果：

   ID  Cycle Repetition Region  Intensity  Mean Cycle1 Intensity
0   A      1          1      x         34                   39.5
1   A      2          1      x         89                    NaN
2   A      3          1      x         34                    NaN
3   A      4          1      x         45                    NaN
4   B      1          1      x         34                   34.0
5   B      2          1      x         56                    NaN
6   B      3          1      x         78                    NaN
7   B      4          1      x         65                    NaN
8   A      1          1      y         45                   39.5
9   A      2          1      y         45                    NaN
10  A      3          1      y         34                    NaN
11  A      4          1      y         56                    NaN
12  B      1          1      y         34                   34.0
13  B      2          1      y         56                    NaN
14  B      3          1      y         56                    NaN
15  B      4          1      y         66                    NaN
16  A      1          2      x         56                   44.5
17  A      2          2      x         78                    NaN
18  A      3          2      x         23                    NaN
19  A      4          2      x         45                    NaN
20  B      1          2      x         42                   38.0
21  B      2          2      x         56                    NaN
22  B      3          2      x         86                    NaN
23  B      4          2      x          5                    NaN
24  A      1          2      y         33                   44.5
25  A      2          2      y         44                    NaN
26  A      3          2      y         78                    NaN
27  A      4          2      y         89                    NaN
28  B      1          2      y         34                   38.0
29  B      2          2      y         42                    NaN
30  B      3          2      y         34                    NaN
31  B      4          2      y         66                    NaN

在 pandas dataframe 中添加一列，这是基于其他列条件的另一列的平均值

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-07-15 18:34:17

在 pandas dataframe 中添加一列，这是基于其他列条件的另一列的平均值

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-07-15 18:34:17

解决方案1
2 已采纳 2020-07-15 18:34:17