[英]Remove groups from a DataFrame that contain only a single unique value in one column
I am processing data with Pandas. 'A' is a unique ID column and column 'E' contains either 1
or 0
.我正在处理 Pandas 的数据。'A' 是唯一 ID 列,'E' 列包含
1
或0
。 I want to keep only groups where the value of column E contains both 0 and 1. (I want to delete rows where columns A are 2 and 4 as those groups contain only 1 and 0s respectively, leaving only rows where columns A are 1, 3, 5).我只想保留 E 列的值同时包含 0 和 1 的组。(我想删除 A 列为 2 和 4 的行,因为这些组分别仅包含 1 和 0,只保留 A 列为 1 的行, 3, 5).
What is the best way to do this?做这个的最好方式是什么?
A B C D E F
1 1 0 0 0 1 1163.7
2 1 0.8 0.8 2.2 0 0
3 1 0.2 0.2 4.4 0 0
4 1 0.8 0.4 0.4 0 0
5 1 0.5 0.7 3.8 0 0
6 2 1 1 8.9 1 116
7 2 1.5 1.5 1.7 1 116
8 2 2 2 8.7 1 116
9 3 3 3 5. 0 0
10 3 4.5 4.5 2.2 0 0
11 3 6.0 6.5 0.8 0 0
12 3 8 8 0.3 0 0
13 3 5.3 0 0 1 116
14 3 0 0 0 1 116
15 4 0.8 0.8 1.1 0 0
16 4 0.2 0.5 3.4 0 0
17 4 0.4 0.8 3.2 0 0
18 4 0.7 0.5 3.0 0 0
19 5 1 1 1.5 0 0
20 5 1.5 1.5 1.7 0 0
21 5 2 2 7.9 1 116
I want to get the following data.我想获得以下数据。
A B C D E F
1 1 0 0 0 1 1163.7
2 1 0.8 0.8 2.2 0 0
3 1 0.2 0.2 4.4 0 0
4 1 0.8 0.4 0.4 0 0
5 1 0.5 0.7 3.8 0 0
6 3 3 3 2.2 0 0
7 3 4.5 4.5 2.2 0 0
8 3 6.0 6.5 0.8 0 0
9 3 8 8 0.3 0 0
10 3 5.3 0 0 1 116
11 3 0 0 0 1 116
12 5 1 1 1.5 0 0
13 5 1.5 1.5 1.7 0 0
14 5 2 2 7.9 1 116
Use Series.groupby
on column E
and transform
using any
to create a boolean mask:在
E
列上使用Series.groupby
并使用any
进行transform
以创建 boolean 掩码:
m = (df['E'].eq(0).groupby(df['A']).transform('any') &
df['E'].eq(1).groupby(df['A']).transform('any'))
df1 = df[m]
Or another idea if column E
consists only of zeros and ones,或者另一个想法,如果
E
列仅包含零和一,
m = df.groupby('A')['E'].nunique().eq(2)
df1 = df[df['A'].isin(m[m].index)]
Result:结果:
print(df1)
A B C D E F
1 1 0.0 0.0 0.0 1 1163.7
2 1 0.8 0.8 2.2 0 0.0
3 1 0.2 0.2 4.4 0 0.0
4 1 0.8 0.4 0.4 0 0.0
5 1 0.5 0.7 3.8 0 0.0
9 3 3.0 3.0 5.0 0 0.0
10 3 4.5 4.5 2.2 0 0.0
11 3 6.0 6.5 0.8 0 0.0
12 3 8.0 8.0 0.3 0 0.0
13 3 5.3 0.0 0.0 1 116.0
14 3 0.0 0.0 0.0 1 116.0
19 5 1.0 1.0 1.5 0 0.0
20 5 1.5 1.5 1.7 0 0.0
21 5 2.0 2.0 7.9 1 116.0
you can use drop_duplicates
on columns A and E and groupby.size
to see where the group by A has 2 different elements as E is only 0 or 1. Then use the index where the size is equal to 2 like:您可以在 A 和 E 列和
drop_duplicates
上使用groupby.size
来查看按 A 分组的位置有 2 个不同的元素,因为 E 仅为 0 或 1。然后使用大小等于 2 的索引,例如:
s = df[['A','E']].drop_duplicates().groupby('A').size()
df_ = df[df['A'].isin(s[s.eq(2)].index)].copy()
print(df_)
A B C D E F
1 1 0.0 0.0 0.0 1 1163.7
2 1 0.8 0.8 2.2 0 0.0
3 1 0.2 0.2 4.4 0 0.0
4 1 0.8 0.4 0.4 0 0.0
5 1 0.5 0.7 3.8 0 0.0
9 3 3.0 3.0 5.0 0 0.0
10 3 4.5 4.5 2.2 0 0.0
11 3 6.0 6.5 0.8 0 0.0
12 3 8.0 8.0 0.3 0 0.0
13 3 5.3 0.0 0.0 1 116.0
14 3 0.0 0.0 0.0 1 116.0
19 5 1.0 1.0 1.5 0 0.0
20 5 1.5 1.5 1.7 0 0.0
21 5 2.0 2.0 7.9 1 116.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.