Pandas：将自定义 function 应用于组并将结果存储在每个组的新列中

Question

I am trying to apply a custom function to each group in a groupby object and store the result into new columns in each group itself.我正在尝试将自定义 function 应用于 groupby object 中的每个组，并将结果存储到每个组本身的新列中。 The function returns 2 values and I want to store these values separately into 2 columns in each group. function 返回 2 个值，我想将这些值分别存储到每组的 2 列中。

I have tried this:我试过这个：

# Returns True if all values in Column1 is different.
def is_unique(x):
    status = True
    if len(x) > 1:
        a = x.to_numpy() 
        if (a[0] == a).all():
            status = False
    return status

# Finds difference of the column values and returns the value with a message.
def func(x):
    d  = (x['Column3'].diff()).dropna()).iloc[0]
    return d, "Calculated!"

# is_unique() is another custom function used to filter unique groups.
df[['Difference', 'Message']] = df.filter(lambda x: is_unique(x['Column1'])).groupby(['Column2']).apply(lambda s: func(s))

But I am getting the error: 'DataFrameGroupBy' object does not support item assignment但我收到错误消息： 'DataFrameGroupBy' object does not support item assignment

I don't want to reset the index and want to view the result using the get_group function.我不想重置索引并想使用get_group function 查看结果。 The final dataframe should look like:最终的 dataframe 应如下所示：

df.get_group('XYZ')


   -----------------------------------------------------------------
   |   Column1 | Column2 | Column3  |  Difference   |    Message   |
   -----------------------------------------------------------------
   | 0   A     |   XYZ   |   100    |               |              |
   ----------------------------------               |              |
   | 1   B     |   XYZ   |    20    |      70       |  Calculated! |
   ----------------------------------               |              |
   | 2   C     |   XYZ   |    10    |               |              |
   -----------------------------------------------------------------

What is the most efficient way to achieve this result?实现此结果的最有效方法是什么？

Answer 1

I think you need:我认为你需要：

def func(x):
    d  = (x['Column3'].diff()).dropna()).iloc[0]
    last = x.index[-1]
    x.loc[last, 'Difference'] = d
    x.loc[last, 'Message'] = "Calculated!"
    return x

df1 = df.filter(lambda x: is_unique(x['Column1']))

df1 = df1.groupby(['Column2']).apply(func)

Pandas：将自定义 function 应用于组并将结果存储在每个组的新列中

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-06 10:06:45

Pandas：将自定义 function 应用于组并将结果存储在每个组的新列中

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-06 10:06:45

解决方案1
1 已采纳 2021-04-06 10:06:45