[英]How to filter Pandas data table using two fields in Python?
I have the following pandas data table :我有以下熊猫数据表:
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
10 test4.png BRIDGING 0.133226 1385 1049 1463 1084 100 1645.0
I want to find the records where "T_Area" == 0
or "T_Area" / "Overlap_Area" > 0.5
using groupby
in pandas.我想在熊猫中使用
groupby
找到"T_Area" == 0
或"T_Area" / "Overlap_Area" > 0.5
的记录。
Item 10 should be dropped in the output since 100 / 1645 < 0.5
由于
100 / 1645 < 0.5
,第 10 项应在输出中删除
df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]
Output:输出:
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
You can use:您可以使用:
m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
Update更新
If you want to remove all rows from a group (file name) where at least one row violate the conditions, use groupby_transform
:如果要从至少有一行违反条件的组(文件名)中删除所有行,请使用
groupby_transform
:
out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
我希望这能帮到您:
df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.