Pandas 根据布尔条件选择行和列

Question

I have a pandas dataframe with about 50 columns and >100 rows.我有一个大约 50 列和 >100 行的 pandas 数据框。 I want to select columns 'col_x' , 'col_y' where 'col_z' < m .我想选择列'col_x' ， 'col_y'其中'col_z' < m 。 Is there a simple way to do this, similar to df[df['col3'] < m] and df[['colx','coly']] but combined?有没有一种简单的方法可以做到这一点，类似于df[df['col3'] < m]和df[['colx','coly']]但结合起来？

Answer 1

Let's break down your problem.让我们分解你的问题。 You want to你想要

Filter rows based on some boolean condition根据一些布尔条件过滤行
You want to select a subset of columns from the result.您想从结果中选择列的子集。

For the first point, the condition you'd need is -对于第一点，您需要的条件是 -

df["col_z"] < m

For the second requirement, you'd want to specify the list of columns that you need -对于第二个要求，您需要指定所需的列列表 -

["col_x", "col_y"]

How would you combine these two to produce an expected output with pandas?你将如何将这两者结合起来使用 pandas 产生预期的输出？ The most straightforward way is using loc -最直接的方法是使用loc -

df.loc[df["col_z"] < m, ["col_x", "col_y"]]

The first argument selects rows, and the second argument selects columns.第一个参数选择行，第二个参数选择列。

More About loc更多关于loc

Think of this in terms of the relational algebra operations - selection and projection .考虑一下关系代数运算 -选择和投影。 If you're from the SQL world, this would be a relatable equivalent.如果您来自 SQL 世界，这将是一个相关的等价物。 The above operation, in SQL syntax, would look like this -上面的操作，在 SQL 语法中，看起来像这样 -

SELECT col_x, col_y     # projection on columns
FROM df
WHERE col_z < m         # selection on rows

pandas loc allows you to specify index labels for selecting rows. pandas loc 允许您指定用于选择行的索引标签。 For example, if you have a dataframe -例如，如果您有一个数据框 -

   col_x  col_y
a      1      4
b      2      5
c      3      6

To select index a , and c , and col_x you'd use -要选择索引a 、 c和col_x您将使用 -

df.loc[['a', 'c'], ['col_x']]

   col_x
a      1
c      3

Alternatively, for selecting by a boolean condition (using a series/array of bool values, as your original question asks), where all values in col_x are odd -或者，对于通过布尔条件进行选择（使用一系列/ bool值数组，如您的原始问题所要求的那样），其中col_x中的所有值都是奇数 -

df.loc[(df.col_x % 2).ne(0), ['col_y']]

   col_y
a      4
c      6

For details, df.col_x % 2 computes the modulus of each value with respect to 2 .有关详细信息， df.col_x % 2计算每个值相对于2的模数。 The ne(0) will then compare the value to 0 , and return True if it isn't (all odd numbers are selected like this).然后ne(0)会将值与0进行比较，如果不是则返回True （所有奇数都是这样选择的）。 Here's what that expression results in -这是该表达式的结果-

(df.col_x % 2).ne(0)

a     True
b    False
c     True
Name: col_x, dtype: bool

Further Reading延伸阅读

Pandas 根据布尔条件选择行和列

问题描述

1 个解决方案

解决方案1
17 已采纳 2017-12-30 16:17:22

Pandas 根据布尔条件选择行和列

问题描述

1 个解决方案

解决方案1 17 已采纳 2017-12-30 16:17:22

解决方案1
17 已采纳 2017-12-30 16:17:22