简体   繁体   English

在 pandas dataframe 中添加一列,这是基于其他列条件的另一列的平均值

[英]Add a column in a pandas dataframe that is the average of another column based on conditions of other columns

Sorry in advance for the long data table.提前为长数据表道歉。 I do not know a more succinct way to construct the dataframe that I have below.我不知道构建下面的 dataframe 的更简洁的方法。

I have a pandas DataFrame:我有一个 pandas DataFrame:

data = {'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'Cycle': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
        'Repetition': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2'],
        'Region': ['x', 'x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'x','x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
        'Intensity': [34, 89, 34, 45, 34, 56, 78, 65, 45, 45, 34, 56, 34, 56, 56, 66, 56, 78, 23, 45, 42, 56, 86, 5, 33, 44, 78, 89, 34, 42, 34, 66]}


data_df= pd.DataFrame(data)

I would like to add a column that calculates the average intensity when Cycle == 1 for each ID (A and B) and each Region (x and y) and leaves NaN values in all other rows.我想添加一个列,用于计算每个 ID(A 和 B)和每个区域(x 和 y)的Cycle == 1时的平均强度,并在所有其他行中保留 NaN 值。 The resulting dataframe would return:生成的 dataframe 将返回:

wanted_data = {'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'Cycle': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
        'Repetition': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2'],
        'Region': ['x', 'x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'x','x','x','x','x','x','x','x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
        'Intensity': [34, 89, 34, 45, 34, 56, 78, 65, 45, 45, 34, 56, 34, 56, 56, 66, 56, 78, 23, 45, 42, 56, 86, 5, 33, 44, 78, 89, 34, 42, 34, 66],
        'Mean Cycle1 Intensity': [39.5, '', '', '', 34, '', '', '', '', '', '', '', '', '', '', '', 44.5, '', '', '', 38, '', '', '', '', '', '', '', '', '', '', ''] }

wanted_data_df= pd.DataFrame(wanted_data)

I tried adding a function:我尝试添加一个 function:

def meanC1(df):
    for i in df['ID'] and j in df['Region']:
        if df['Cycle'] == 1:
            df['Mean Cycle1 Intensity'] = df['Intensity'].mean()

But this returns,但这又回来了,

ValueError: The truth value of a Series is ambiguous. ValueError:Series 的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all()使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

Use Series.ne to create a boolean mask m , then use Series.mask to mask the Intensity column on m , next use Series.groupby to group the masked column on ID and Repetition and transform using mean , finally again use Series.mask to mask the transformed result:使用Series.ne创建一个boolean 掩码m ,然后使用Series.mask屏蔽m上的Intensity列,接下来使用Series.groupbyIDRepetition上的屏蔽列进行分组,并使用mean进行transform ,最后再次使用Series.mask来掩盖转换后的结果:

# Note: Here df refers to `data_df`

m = df['Cycle'].ne(1)
df['Mean Cycle1 Intensity'] = (
    df['Intensity'].mask(m)
    .groupby([df['ID'], df['Repetition']]).transform('mean').mask(m)
)

Result:结果:

   ID  Cycle Repetition Region  Intensity  Mean Cycle1 Intensity
0   A      1          1      x         34                   39.5
1   A      2          1      x         89                    NaN
2   A      3          1      x         34                    NaN
3   A      4          1      x         45                    NaN
4   B      1          1      x         34                   34.0
5   B      2          1      x         56                    NaN
6   B      3          1      x         78                    NaN
7   B      4          1      x         65                    NaN
8   A      1          1      y         45                   39.5
9   A      2          1      y         45                    NaN
10  A      3          1      y         34                    NaN
11  A      4          1      y         56                    NaN
12  B      1          1      y         34                   34.0
13  B      2          1      y         56                    NaN
14  B      3          1      y         56                    NaN
15  B      4          1      y         66                    NaN
16  A      1          2      x         56                   44.5
17  A      2          2      x         78                    NaN
18  A      3          2      x         23                    NaN
19  A      4          2      x         45                    NaN
20  B      1          2      x         42                   38.0
21  B      2          2      x         56                    NaN
22  B      3          2      x         86                    NaN
23  B      4          2      x          5                    NaN
24  A      1          2      y         33                   44.5
25  A      2          2      y         44                    NaN
26  A      3          2      y         78                    NaN
27  A      4          2      y         89                    NaN
28  B      1          2      y         34                   38.0
29  B      2          2      y         42                    NaN
30  B      3          2      y         34                    NaN
31  B      4          2      y         66                    NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据数据框中的其他列在熊猫中添加具有多种条件的列 - Adding column in pandas with several conditions based on other columns in dataframe Pandas Dataframe 根据其他列的计数添加列 - Pandas Dataframe add column based on counts of other columns 向 Pandas DataFrame 添加一列,并基于其他列进行多次查找 - Add a column to Pandas DataFrame with multiple lookups based on other columns Pandas 数据框根据其他列是否有数据添加新列 - Pandas dataframe add new column based on if other columns have data or not 根据其他列中的条件和比较添加列 - add column based on conditions and comparisons in other columns 基于其他列向 pandas dataframe 添加列 - Adding a column to a pandas dataframe based on other columns Pandas Dataframe 更新列基于将其他一些列与另一个具有不同列数的 dataframe 的列进行比较 - Pandas Dataframe updating a column based comparing some other columns with the columns of another dataframe with different number of columns 根据另一列上的条件修改Pandas DataFrame列 - Modifying a Pandas DataFrame column based on conditions on another column 如何根据另一列满足的条件在 pandas dataframe 中添加新列? - How to add a new column in pandas dataframe based on conditions satisfied in another column? 根据其他行和列的多个条件在数据框中创建新列? 包括空行? - 蟒蛇/熊猫 - Creating a new column in dataframe based on multiple conditions from other rows and columns? Including rows that are null? - Python/Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM