如何使用 Python 中的两个字段过滤 Pandas 数据表？

Question

I have the following pandas data table :我有以下熊猫数据表：

    File_name     River  Confidance     X     Y     W     H  T_Area  Overlap_Area
0   test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1   test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2   test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3   test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4   test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5   test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6   test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7   test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8   test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9   test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0
10  test4.png  BRIDGING    0.133226  1385  1049  1463  1084     100        1645.0

I want to find the records where "T_Area" == 0 or "T_Area" / "Overlap_Area" > 0.5 using groupby in pandas.我想在熊猫中使用groupby找到"T_Area" == 0或"T_Area" / "Overlap_Area" > 0.5的记录。

Item 10 should be dropped in the output since 100 / 1645 < 0.5由于100 / 1645 < 0.5 ，第 10 项应在输出中删除

Answer 1

df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]

Output:输出：

   File_name     River  Confidance     X     Y     W     H  T_Area  Overlap_Area
0  test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7  test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8  test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9  test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0

Answer 2

You can use:您可以使用：

m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)

# Output
   File name     River  Confidance     X     Y     W     H  T Area  Overlap Area
0  test1.png  BRIDGING    0.587851   739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243   980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472   966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194   275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075   944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732   929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589   886  1199   963  1248       0           0.0
7  test4.png  BRIDGING    0.594574   147  1390   224  1456       0           0.0
8  test4.png  BRIDGING    0.149411   150  1701   221  1732       0           0.0
9  test4.png  BRIDGING    0.145715  1385  1245  1462  1279       0           0.0

Update更新

If you want to remove all rows from a group (file name) where at least one row violate the conditions, use groupby_transform :如果要从至少有一行违反条件的组（文件名）中删除所有行，请使用groupby_transform ：

out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)

# Output
   File name     River  Confidance    X     Y     W     H  T Area  Overlap Area
0  test1.png  BRIDGING    0.587851  739   821   769   894       0           0.0
1  test1.png  BRIDGING    0.579243  980   286  1018   361       0           0.0
2  test1.png  BRIDGING    0.534472  966   935  1038   973    1406         296.0
3  test1.png  BRIDGING    0.530194  275   859   313   934       0           0.0
4  test1.png  BRIDGING    0.368075  944   516   976   589       0           0.0
5  test1.png  BRIDGING    0.132732  929   814  1000   856    1640        1240.0
6  test2.png  BRIDGING    0.748589  886  1199   963  1248       0           0.0

Answer 3

我希望这能帮到您：

df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])

如何使用 Python 中的两个字段过滤 Pandas 数据表？

问题描述

3 个解决方案

解决方案1
3 已采纳 2022-05-25 06:26:30

解决方案2
3 2022-05-25 06:28:11

解决方案3
0 2022-05-25 07:39:16

如何使用 Python 中的两个字段过滤 Pandas 数据表？

问题描述

3 个解决方案

解决方案1 3 已采纳 2022-05-25 06:26:30

解决方案2 3 2022-05-25 06:28:11

解决方案3 0 2022-05-25 07:39:16

解决方案1
3 已采纳 2022-05-25 06:26:30

解决方案2
3 2022-05-25 06:28:11

解决方案3
0 2022-05-25 07:39:16