[英]Pandas: How to use (df.groupby) in a lambda formula
The example below:下面的例子:
import pandas as pd
list1 = ['a','a','a','b','b','b','b','c','c','c']
list2 = range(len(list1))
df = pd.DataFrame(zip(list1, list2), columns= ['Item','Value'])
df
gives:给出:
required: GroupFirstValue column as shown below.必需:GroupFirstValue 列,如下所示。
The idea is to use a lambda formula to get the 'first' value for each group..for example "a"'s first value is 0, "b"'s first value is 3, "c"'s first value is 7. That's why those numbers appear in the GroupFirstValue column.这个想法是使用 lambda 公式来获得每个组的“第一个”值。例如“a”的第一个值为 0,“b”的第一个值为 3,“c”的第一个值为7. 这就是为什么这些数字出现在 GroupFirstValue 列中的原因。
Note: I know that I can do this on 2 steps...one is the original df and the second is a grouped by df and then merge them together.注意:我知道我可以通过 2 个步骤执行此操作……一个是原始 df,第二个是按 df 分组,然后将它们合并在一起。 The idea is to see if this can be done more efficiently in a single step.
我们的想法是看看这是否可以在一个步骤中更有效地完成。 Many thanks in advance!
提前谢谢了!
groupby and use first groupby 并首先使用
df.groupby('Item')['Value'].first()
or you can use transform and assign to a new column in your frame或者您可以使用转换并分配给框架中的新列
df['new_col'] = df.groupby('Item')['Value'].transform('first')
Use mask
and duplicated
使用
mask
并duplicated
df['GroupFirstValue'] = df.Value.mask(df.Item.duplicated())
Out[109]:
Item Value GroupFirstValue
0 a 0 0.0
1 a 1 NaN
2 a 2 NaN
3 b 3 3.0
4 b 4 NaN
5 b 5 NaN
6 b 6 NaN
7 c 7 7.0
8 c 8 NaN
9 c 9 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.