当两列重复时删除，但根据第三列的值保留（熊猫）

Question

I'm looking at a way in which to remove all rows which are duplicated on Barcode and Product No., but to keep those duplicated rows when it was their latest Input.我正在寻找一种方法来删除在条形码和产品编号上重复的所有行，但在它们是最新输入时保留这些重复的行。 Example below:下面的例子：

What I have:我有的：

Input ID输入 ID	Barcode条码	Product No.产品编号
001 001	225 225	111 111
001 001	225 225	111 111
001 001	225 225	111 111
002 002	225 225	111 111
002 002	225 225	111 111
002 002	225 225	111 111
002 002	225 225	111 111
003 003	226 226	222 222
003 003	226 226	222 222
004 004	226 226	222 222
004 004	226 226	222 222
005 005	227 227	222 222
005 005	227 227	222 222
006 006	227 227	222 222
006 006	227 227	222 222

Output: Output：

Input ID输入 ID	Barcode条码	Product No.产品编号
002 002	225 225	111 111
002 002	225 225	111 111
002 002	225 225	111 111
002 002	225 225	111 111
004 004	226 226	222 222
004 004	226 226	222 222
006 006	227 227	222 222
006 006	227 227	222 222

You can see where the Barcode and Product no.您可以看到条形码和产品编号的位置。 are the same all but the highest Input ID rows have now been removed leaving only duplicates which have the latest input.除了最高的输入 ID 行之外，其他所有行都相同，现在已删除，只留下具有最新输入的重复项。

Thanks Oli谢谢奥利

Answer 1

You could run duplicated to identify the last duplicate and extend the selection per group using groupby + transform('any') :您可以运行duplicated以识别最后一个副本并使用groupby + transform('any')扩展每个组的选择：

df[((~df[['Product No.', 'Barcode']].duplicated(keep='last'))
   .groupby(df['Input ID']).transform('any'))]

output: output：

    Input ID  Barcode  Product No.
3          2      225          111
4          2      225          111
5          2      225          111
6          2      225          111
9          4      226          222
10         4      226          222
13         6      227          222
14         6      227          222

当两列重复时删除，但根据第三列的值保留（熊猫）

问题描述

1 个解决方案

解决方案1
2 2022-01-18 15:19:36

当两列重复时删除，但根据第三列的值保留（熊猫）

问题描述

1 个解决方案

解决方案1 2 2022-01-18 15:19:36

解决方案1
2 2022-01-18 15:19:36