Pandas：Dataframe itertuples 布尔系列 groupby 优化

Question

I'm new in python.我是python的新手。 I have data frame (DF) example:我有数据框（DF）示例：

id ID	type类型
1 1	A一种
1 1	B乙
2 2	C C
2 2	B乙

I would like to add a column example A_flag group by id.我想按 id 添加列示例 A_flag 组。 In the end I have data frame (DF):最后我有数据框（DF）：

id ID	type类型	A_flag一只旗
1 1	A一种	1 1
1 1	B乙	1 1
2 2	C C	0 0
2 2	B乙	0 0

I can do this in two step:我可以分两步做到这一点：

DF['A_flag_tmp'] = [1 if x.type=='A' else 0 for x in DF.itertuples()]
DF['A_flag'] = DF.groupby(['id'])['A_flag_tmp'].transform(np.max)

It's working, but it's very slowy for big data frame.它正在工作，但对于大数据帧来说非常慢。 Is there any way to optimize this case ?有没有办法优化这种情况？ Thank's for help.感谢帮助。

Answer 1

Change your codes with slow iterative coding to fast vectorized coding by replacing your first step to generate a boolean series by Pandas built-in functions, eg通过替换 Pandas 内置函数生成布尔系列的第一步，将慢速迭代编码的代码更改为快速矢量化编码，例如

df['type'].eq('A')

Then, you can attach it to the groupby statement for second step, as follows:然后，您可以将其附加到第二步的 groupby 语句中，如下所示：

df['A_flag'] = df['type'].eq('A').groupby(df['id']).transform('max').astype(int)

Result结果

print(df)


   id type  A_flag
0   1    A       1
1   1    B       1
2   2    C       0
3   2    B       0

In general, if you have more complicated conditions, you can also define it in vectorized way, eg.一般来说，如果你有更复杂的条件，你也可以用矢量化的方式定义它，例如。 define the boolean series m by:通过以下方式定义布尔系列m ：

m = df['type'].eq('A') & df['type1'].gt(1)  | (df['type2'] != 0)

Then, use it in step 2 as follows:然后，在步骤 2 中使用它，如下所示：

m.groupby(df['id']).transform('max').astype(int)

Pandas：Dataframe itertuples 布尔系列 groupby 优化

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-10-12 22:19:17

Pandas：Dataframe itertuples 布尔系列 groupby 优化

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-10-12 22:19:17

解决方案1
1 已采纳 2021-10-12 22:19:17