简体   繁体   English

从 DataFrame 中删除在一列中仅包含一个唯一值的组

[英]Remove groups from a DataFrame that contain only a single unique value in one column

I am processing data with Pandas. 'A' is a unique ID column and column 'E' contains either 1 or 0 .我正在处理 Pandas 的数据。'A' 是唯一 ID 列,'E' 列包含10 I want to keep only groups where the value of column E contains both 0 and 1. (I want to delete rows where columns A are 2 and 4 as those groups contain only 1 and 0s respectively, leaving only rows where columns A are 1, 3, 5).我只想保留 E 列的值同时包含 0 和 1 的组。(我想删除 A 列为 2 和 4 的行,因为这些组分别仅包含 1 和 0,只保留 A 列为 1 的行, 3, 5).

What is the best way to do this?做这个的最好方式是什么?

    A   B   C   D   E   F
1   1   0   0   0   1   1163.7
2   1   0.8 0.8 2.2 0   0
3   1   0.2 0.2 4.4 0   0
4   1   0.8 0.4 0.4 0   0
5   1   0.5 0.7 3.8 0   0
6   2   1   1   8.9 1   116
7   2   1.5 1.5 1.7 1   116
8   2   2   2   8.7 1   116
9   3   3   3   5.  0   0
10  3   4.5 4.5 2.2 0   0
11  3   6.0 6.5 0.8 0   0
12  3   8   8   0.3 0   0
13  3   5.3 0   0   1   116
14  3   0   0   0   1   116
15  4   0.8 0.8 1.1 0   0
16  4   0.2 0.5 3.4 0   0
17  4   0.4 0.8 3.2 0   0
18  4   0.7 0.5 3.0 0   0
19  5   1   1   1.5 0   0
20  5   1.5 1.5 1.7 0   0
21  5   2   2   7.9 1   116

I want to get the following data.我想获得以下数据。

       A   B   C   D   E   F

1   1   0   0   0   1   1163.7
2   1   0.8 0.8 2.2 0   0
3   1   0.2 0.2 4.4 0   0
4   1   0.8 0.4 0.4 0   0
5   1   0.5 0.7 3.8 0   0
6   3   3   3   2.2 0   0
7   3   4.5 4.5 2.2 0   0
8   3   6.0 6.5 0.8 0   0
9   3   8   8   0.3 0   0
10  3   5.3 0   0   1   116
11  3   0   0   0   1   116
12  5   1   1   1.5 0   0
13  5   1.5 1.5 1.7 0   0
14  5   2   2   7.9 1   116

Use Series.groupby on column E and transform using any to create a boolean mask:E列上使用Series.groupby并使用any进行transform以创建 boolean 掩码:

m = (df['E'].eq(0).groupby(df['A']).transform('any') &
     df['E'].eq(1).groupby(df['A']).transform('any'))
df1 = df[m]

Or another idea if column E consists only of zeros and ones,或者另一个想法,如果E列仅包含零和一,

m = df.groupby('A')['E'].nunique().eq(2)
df1 = df[df['A'].isin(m[m].index)]

Result:结果:

print(df1)
    A    B    C    D  E       F
1   1  0.0  0.0  0.0  1  1163.7
2   1  0.8  0.8  2.2  0     0.0
3   1  0.2  0.2  4.4  0     0.0
4   1  0.8  0.4  0.4  0     0.0
5   1  0.5  0.7  3.8  0     0.0
9   3  3.0  3.0  5.0  0     0.0
10  3  4.5  4.5  2.2  0     0.0
11  3  6.0  6.5  0.8  0     0.0
12  3  8.0  8.0  0.3  0     0.0
13  3  5.3  0.0  0.0  1   116.0
14  3  0.0  0.0  0.0  1   116.0
19  5  1.0  1.0  1.5  0     0.0
20  5  1.5  1.5  1.7  0     0.0
21  5  2.0  2.0  7.9  1   116.0

you can use drop_duplicates on columns A and E and groupby.size to see where the group by A has 2 different elements as E is only 0 or 1. Then use the index where the size is equal to 2 like:您可以在 A 和 E 列和drop_duplicates上使用groupby.size来查看按 A 分组的位置有 2 个不同的元素,因为 E 仅为 0 或 1。然后使用大小等于 2 的索引,例如:

s = df[['A','E']].drop_duplicates().groupby('A').size()
df_ = df[df['A'].isin(s[s.eq(2)].index)].copy()
print(df_)
    A    B    C    D  E       F
1   1  0.0  0.0  0.0  1  1163.7
2   1  0.8  0.8  2.2  0     0.0
3   1  0.2  0.2  4.4  0     0.0
4   1  0.8  0.4  0.4  0     0.0
5   1  0.5  0.7  3.8  0     0.0
9   3  3.0  3.0  5.0  0     0.0
10  3  4.5  4.5  2.2  0     0.0
11  3  6.0  6.5  0.8  0     0.0
12  3  8.0  8.0  0.3  0     0.0
13  3  5.3  0.0  0.0  1   116.0
14  3  0.0  0.0  0.0  1   116.0
19  5  1.0  1.0  1.5  0     0.0
20  5  1.5  1.5  1.7  0     0.0
21  5  2.0  2.0  7.9  1   116.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除另一列中具有多个唯一值的组 - Remove groups with more than one unique value in another column 如何从 Pandas dataframe 中删除列标签包含唯一字符的所有列 - How to remove all columns whose column labels contain unique characters from Pandas dataframe 从熊猫数据框中随机删除每列中的单个值? - Remove a single value from each column randomly from pandas dataframe? 如何基于Pandas中特定值的一列转换仅在一列中仅具有唯一值的DataFrame - How to convert a DataFrame that only has unique value in one column based on one column in specific value in Pandas 将 DataFrame 拆分为仅包含给定常量值的组 - Split DataFrame into groups that contain only a given constant value 如果列在列表中包含超过 x 个值,则删除组 - Remove groups if column contain more than x number of value in a list Pandas Dataframe:从另一列中唯一值最多的列中查找唯一值 - Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column 使用正则表达式从熊猫数据帧python中删除在单列的所有行中找到的唯一单词 - remove unique words found in all rows of a single column from panda dataframe python using regex 如果一列与值匹配,则从 dataframe 中删除行 - Python 3.6 - Remove rows from dataframe if one column matches a value - Python 3.6 仅将值添加到pandas数据框中的单个列 - adding value to only a single column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM