[英]Python Pandas - how to remove duplicates depending on column values
您可以在get_dummies
上使用groupby
来获得所需的 output。
>>> df = pd.DataFrame({"A":[1,1,1,2,2,2], "B":[1,1,1,2,2,2], "C":["Q","R","QR","R","QR","Q"], "D":[1,1,1,2,2,2], "E":["X","X","X","Y","Y","Y"]})
>>> df
A B C D E
0 1 1 Q 1 X
1 1 1 R 1 X
2 1 1 QR 1 X
3 2 2 R 2 Y
4 2 2 QR 2 Y
5 2 2 Q 2 Y
>>> df = pd.get_dummies(df, columns=["C","E"])
>>> df.groupby(["A","B","D"]).agg(sum).reset_index()
A B D C_Q C_QR C_R E_X E_Y
0 1 1 1 1 1 1 3 0
1 2 2 2 1 1 1 0 3
>>> df.groupby(["A","B","D"]).agg(max).reset_index()
A B D C_Q C_QR C_R E_X E_Y
0 1 1 1 1 1 1 1 0
1 2 2 2 1 1 1 0 1
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.