简体   繁体   English

如何使用 Python 中的两个字段过滤 Pandas 数据表?

[英]How to filter Pandas data table using two fields in Python?

I have the following pandas data table :我有以下熊猫数据表:

    File_name     River  Confidance     X     Y     W     H  T_Area  Overlap_Area
0   test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1   test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2   test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3   test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4   test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5   test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6   test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7   test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8   test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9   test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0
10  test4.png  BRIDGING    0.133226  1385  1049  1463  1084     100        1645.0

I want to find the records where "T_Area" == 0 or "T_Area" / "Overlap_Area" > 0.5 using groupby in pandas.我想在熊猫中使用groupby找到"T_Area" == 0"T_Area" / "Overlap_Area" > 0.5的记录。

Item 10 should be dropped in the output since 100 / 1645 < 0.5由于100 / 1645 < 0.5 ,第 10 项应在输出中删除

df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]

Output:输出:

   File_name     River  Confidance     X     Y     W     H  T_Area  Overlap_Area
0  test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7  test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8  test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9  test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0

You can use:您可以使用:

m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)

# Output
   File name     River  Confidance     X     Y     W     H  T Area  Overlap Area
0  test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7  test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8  test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9  test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0

Update更新

If you want to remove all rows from a group (file name) where at least one row violate the conditions, use groupby_transform :如果要从至少有一行违反条件的组(文件名)中删除所有行,请使用groupby_transform

out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)

# Output
   File name     River  Confidance    X     Y     W     H  T Area  Overlap Area
0  test1.png  BRIDGING    0.587851  739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243  980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472  966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194  275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075  944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732  929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589  886  1199   963  1248       0           0.0

我希望这能帮到您:

df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM