[英]How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas?
I have DataFrame in Python Pandas like below:我在 Python Pandas 中有 DataFrame,如下所示:
data types:数据类型:
ID - int ID - 整数
TYPE - object类型 - object
TG_A - int TG_A - 整数
TG_B - int TG_B - 整数
ID ID | TYPE类型 | TG_A TG_A | TG_B TG_B |
---|---|---|---|
111 111 | A一种 | 1 1个 | 0 0 |
111 111 | B乙 | 1 1个 | 0 0 |
222 222 | B乙 | 1 1个 | 0 0 |
222 222 | A一种 | 1 1个 | 0 0 |
333 333 | B乙 | 0 0 | 1 1个 |
333 333 | A一种 | 0 0 | 1 1个 |
And I need to drop duplicates in above DataFrame, so as to:我需要在上面的 DataFrame 中删除重复项,以便:
So, as a result I need something like below:因此,结果我需要如下内容:
ID | TYPE | TG_A | TG_B
----|------|------|-----
111 | A | 1 | 0
222 | A | 1 | 0
333 | B | 0 | 1
How can I do that in Python Pandas?我怎样才能在 Python Pandas 中做到这一点?
You can use two boolean masks and groupby.idxmax
to get the first non matching value:您可以使用两个 boolean 掩码和groupby.idxmax
来获取第一个不匹配的值:
m1 = df['TYPE'].eq('B') & df['TG_A'].eq(1)
m2 = df['TYPE'].eq('A') & df['TG_B'].eq(1)
out = df.loc[(~(m1|m2)).groupby(df['ID']).idxmax()]
Output: Output:
ID TYPE TG_A TG_B
0 111 A 1 0
3 222 A 1 0
4 333 B 0 1
df[df['TYPE'].eq('A').eq(df['TG_A'])]
result
ID TYPE TG_A TG_B
0 111 A 1 0
3 222 A 1 0
4 333 B 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.