[英]How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas?
I have DataFrame in Python Pandas like below:我在 Python Pandas 中有 DataFrame,如下所示:
ID | COL1| COL2 | COL3
----------|------|------
123 | XXX | 0 | 1
123 | XXX | 1 | 1
444 | ABC | 1 | 1
444 | ABC | 1 | 1
555 | PPP | 0 | 0
And I need to drop duplicates in above DF in that way:我需要以这种方式在上面的 DF 中删除重复项:
So as a result I need output like below (I have many more columns so in output I need to have not only ID, COL2, COL3, but ID, COL1, COL2, COL3)因此,我需要如下所示的 output(我有更多的列,所以在 output 中,我不仅需要 ID、COL2、COL3,还需要 ID、COL1、COL2、COL3)
ID | COL1| COL2 | COL3
----|-----|------|-----
123 | XXX | 1 | 1
444 | ABC | 1 | 1
555 | PPP | 0 | 0
How can I do that in Python Pandas?我怎样才能在 Python Pandas 中做到这一点?
Use a groupby.max
:使用groupby.max
:
out = df.groupby(['ID', 'COL1'], as_index=False).max()
output: output:
ID COL1 COL2 COL3
0 123 XXX 1 1
1 444 ABC 1 1
2 555 PPP 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.