Pandas：如何在 lambda 公式中使用 (df.groupby)

Question

The example below:下面的例子：

import pandas as pd
list1 = ['a','a','a','b','b','b','b','c','c','c']
list2 = range(len(list1))
df = pd.DataFrame(zip(list1, list2), columns=  ['Item','Value'])
df

gives:给出：

required: GroupFirstValue column as shown below.必需：GroupFirstValue 列，如下所示。

The idea is to use a lambda formula to get the 'first' value for each group..for example "a"'s first value is 0, "b"'s first value is 3, "c"'s first value is 7. That's why those numbers appear in the GroupFirstValue column.这个想法是使用 lambda 公式来获得每个组的“第一个”值。例如“a”的第一个值为 0，“b”的第一个值为 3，“c”的第一个值为7. 这就是为什么这些数字出现在 GroupFirstValue 列中的原因。

Note: I know that I can do this on 2 steps...one is the original df and the second is a grouped by df and then merge them together.注意：我知道我可以通过 2 个步骤执行此操作……一个是原始 df，第二个是按 df 分组，然后将它们合并在一起。 The idea is to see if this can be done more efficiently in a single step.我们的想法是看看这是否可以在一个步骤中更有效地完成。 Many thanks in advance!提前谢谢了！

Answer 1

groupby and use first groupby 并首先使用

df.groupby('Item')['Value'].first()

or you can use transform and assign to a new column in your frame或者您可以使用转换并分配给框架中的新列

df['new_col'] = df.groupby('Item')['Value'].transform('first')

Answer 2

Use mask and duplicated使用mask并duplicated

df['GroupFirstValue'] = df.Value.mask(df.Item.duplicated())

Out[109]:
  Item  Value  GroupFirstValue
0    a      0              0.0
1    a      1              NaN
2    a      2              NaN
3    b      3              3.0
4    b      4              NaN
5    b      5              NaN
6    b      6              NaN
7    c      7              7.0
8    c      8              NaN
9    c      9              NaN

Pandas：如何在 lambda 公式中使用 (df.groupby)

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-04-30 22:28:42

解决方案2
1 2020-04-30 22:38:11

Pandas：如何在 lambda 公式中使用 (df.groupby)

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-04-30 22:28:42

解决方案2 1 2020-04-30 22:38:11

解决方案1
1 已采纳 2020-04-30 22:28:42

解决方案2
1 2020-04-30 22:38:11