简体   繁体   English

如何聚合 DataFrame 并根据 Python Pandas 中两列中的值删除重复项?

[英]How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas?

I have DataFrame in Python Pandas like below:我在 Python Pandas 中有 DataFrame,如下所示:

ID  | COL1| COL2 | COL3
----------|------|------
123 | XXX | 0    | 1
123 | XXX | 1    | 1
444 | ABC | 1    | 1
444 | ABC | 1    | 1 
555 | PPP | 0    | 0

And I need to drop duplicates in above DF in that way:我需要以这种方式在上面的 DF 中删除重复项:

  • if in COL2 or COL3 is at least once '1' then should be 1 in these columns for ID (nevermind how often he had 0 in mentioned columns)如果在 COL2 或 COL3 中至少有一次“1”,那么在这些 ID 列中应该为 1(不管他在提到的列中出现 0 的频率如何)
  • rest of columns should still be in output列的 rest 应该仍然在 output
  • In COL1 the is no duplicates per ID在 COL1 中,每个 ID 没有重复项

So as a result I need output like below (I have many more columns so in output I need to have not only ID, COL2, COL3, but ID, COL1, COL2, COL3)因此,我需要如下所示的 output(我有更多的列,所以在 output 中,我不仅需要 ID、COL2、COL3,还需要 ID、COL1、COL2、COL3)

ID  | COL1| COL2 | COL3
----|-----|------|-----
123 | XXX | 1    | 1
444 | ABC | 1    | 1
555 | PPP | 0    | 0

How can I do that in Python Pandas?我怎样才能在 Python Pandas 中做到这一点?

Use a groupby.max :使用groupby.max

out = df.groupby(['ID', 'COL1'], as_index=False).max()

output: output:

    ID COL1  COL2  COL3
0  123  XXX     1     1
1  444  ABC     1     1
2  555  PPP     0     0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项? - How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas? 如何根据列的值(列的名称不同)从 pandas dataframe 中删除重复的列? - How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)? Python dataframe 基于两对列删除重复项 - Python dataframe drop duplicates based on two pairs of columns 根据两列的值删除数据帧pandas中的重复项 - Remove duplicates in dataframe pandas based on values of two columns 在基于字符串的列上删除重复项,但使用值聚合相关列 - Drop duplicates on string based columns but aggregate relevant columns with values 如果两列中的连续值相同,如何在python中删除重复项? - How to drop duplicates in python if consecutive values are the same in two columns? 根据空值的百分比删除pandas数据帧中的列 - Drop columns in a pandas dataframe based on the % of null values 如何基于过滤python中的两列来返回重复数的数据帧 - How to return a dataframe of number of duplicates based on filtering the two columns in python 有趣的 Pandas dataframe 问题:如何在两列上删除重复项(反向)- 对于具有共同属性的每一行? - Interesting Pandas dataframe problem: how to drop duplicates (inverse) over two columns - for each row with a common attribute? pandas drop_duplicates 对另外两列值的条件 - pandas drop_duplicates condition on two other columns values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM