根据不同的列值选择多行

Question

我正在尝试根据分类评估一些图像。 我使用下面的一段代码来读取 csv 文件：

import pandas as pd
file = pd.read_csv('test.csv', header=None)

所以我有一些看起来像这样的东西：

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image2  4  5  6  7  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

因此，如果我想保留值为“Green”的图像，output 应该如下所示：

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

这意味着当至少有一个我检查的元素位于最后一列时，我想在第一列中保留具有相同 id 的图像。

我使用了isin方法，但我不知道如何将图像保留为 rest 行的图像，这些图像至少在最后一列中具有“绿色”值。

Answer 1

您可以使用loc在第 6 列为Green的第一列中查找值，并将其用作传递给isin的值：

df[df[0].isin(df.loc[df[5] == "Green", 0])]
# if it has to be the last column, instead of the 6h column, use `iloc` instead:
# df[df[0].isin(df.loc[df.iloc[:, -1] == "Green", 0])]

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

分解它：

内部loc检索第一列中包含Green的图像：

df.loc[df[5] == "Green", 0] 
0    Image1
3    Image3
5    Image4
Name: 0, dtype: object

将其传递给isin ，您将获得一个 boolean 掩码，其中第一列与其中一个值匹配：

df[0].isin(df.loc[df[5] == "Green", 0])
0     True
1     True
2    False
3     True
4     True
5     True
Name: 0, dtype: bool

您可以使用它来过滤您的df ：

df[df[0].isin(df.loc[df[5] == "Green", 0])]

Answer 2

我们可以在这里使用GroupBy.any ，在这里我们检查是否有任何行满足我们的条件：

df[df[5].eq("Green").groupby(df[0]).transform("any")]

        0  1  2  3  4       5
0  Image1  2  3  4  5   Green
1  Image1  3  4  5  6     Red
3  Image3  1  4  8  9   Green
4  Image4  5  3  0  1  Yellow
5  Image4  6  2  1  1   Green

根据不同的列值选择多行

问题描述

2 个解决方案

解决方案1
1 2021-04-19 22:35:22

解决方案2
1 已采纳 2021-04-19 22:45:25

根据不同的列值选择多行

问题描述

2 个解决方案

解决方案1 1 2021-04-19 22:35:22

解决方案2 1 已采纳 2021-04-19 22:45:25

解决方案1
1 2021-04-19 22:35:22

解决方案2
1 已采纳 2021-04-19 22:45:25