I have a dataframe which is as follows:
imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,1,491,182,78,1,1
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,5,451,95,48,2,1
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,455,342,84,93,9,-7
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130
Its a csv dump. From this I want to group by imagename and brandname. Wherever the values in xdiff and ydiff is less than 10 then remove the second line.
For example, from the first two lines I want to delete the second line, similarly from lines 3 and 4 I want to delete line 4.
I could do this quickly in R using dplyr group by, lag and lead functions. However, I am not sure how to combine different functions in python to achieve this. This is what I have tried so far:
df[df.groupby(['imagename','brandname']).xdiff.transform() <= 10]
Not sure what function should I call within transform and how to include ydiff
too.
The expected output is as follows:
imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130
You can take individual groupby frames and apply the conditions through apply
function
#df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if x['xdiff'].lt(10).any() else x)
df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if (x['xdiff'].lt(10).any() and x['ydiff'].lt(10).any()) else x)
Out:
imagename locationName brandname x y w h xdiff ydiff
2 95-20180407-215120-235505-00050.jpg Shirt DHFL 3 450 94 45 2 -41
5 95-20180407-215120-235505-00050.jpg Shirt DHFL 446 349 99 90 279 30
7 95-20180407-215120-235505-00050.jpg Shirt GOIBIBO 559 212 70 106 104 -130
0 95-20180407-215120-235505-00050.jpg Shirt SAMSUNG 0 490 177 82 0 0
4 95-20180407-215120-235505-00050.jpg DUGOUT VIVO 167 319 36 38 162 -132
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.