熊猫按多列分组并根据多个条件删除行

Question

I have a dataframe which is as follows:我有一个数据框，如下所示：

imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,1,491,182,78,1,1
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,5,451,95,48,2,1
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,455,342,84,93,9,-7
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130

Its a csv dump.它是一个 csv 转储。 From this I want to group by imagename and brandname.由此我想按图像名称和品牌名称进行分组。 Wherever the values in xdiff and ydiff is less than 10 then remove the second line.只要 xdiff 和 ydiff 中的值小于 10，就删除第二行。

For example, from the first two lines I want to delete the second line, similarly from lines 3 and 4 I want to delete line 4.例如，从前两行我想删除第二行，类似地从第 3 行和第 4 行我想删除第 4 行。

I could do this quickly in R using dplyr group by, lag and lead functions.我可以在 R 中使用 dplyr group by、lag 和 lead 函数快速完成此操作。 However, I am not sure how to combine different functions in python to achieve this.但是，我不确定如何在 python 中组合不同的函数来实现这一点。 This is what I have tried so far:这是我迄今为止尝试过的：

df[df.groupby(['imagename','brandname']).xdiff.transform() <= 10]

Not sure what function should I call within transform and how to include ydiff too.不确定我应该在转换中调用什么函数以及如何包含ydiff 。

The expected output is as follows:预期输出如下：

imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130

Answer 1

You can take individual groupby frames and apply the conditions through apply function您可以采用单独的 groupby 帧并通过apply函数应用条件

#df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if x['xdiff'].lt(10).any() else x)
df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if (x['xdiff'].lt(10).any() and x['ydiff'].lt(10).any()) else x)

Out:出去：

    imagename   locationName    brandname   x   y   w   h   xdiff   ydiff
2   95-20180407-215120-235505-00050.jpg Shirt   DHFL    3   450 94  45  2   -41
5   95-20180407-215120-235505-00050.jpg Shirt   DHFL    446 349 99  90  279 30
7   95-20180407-215120-235505-00050.jpg Shirt   GOIBIBO 559 212 70  106 104 -130
0   95-20180407-215120-235505-00050.jpg Shirt   SAMSUNG 0   490 177 82  0   0
4   95-20180407-215120-235505-00050.jpg DUGOUT  VIVO    167 319 36  38  162 -132

熊猫按多列分组并根据多个条件删除行

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-23 06:46:16

熊猫按多列分组并根据多个条件删除行

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-23 06:46:16

解决方案1
1 已采纳 2019-03-23 06:46:16