为什么不能根据多个或条件在 python pandas 数据框中选择数据

Question

I have a df with multiple columns and trying to select a subset of the data based on an OR logic:我有一个包含多列的 df 并尝试根据 OR 逻辑选择数据的子集：

df [ (df['col1']==0) | (df['col2']==0) | (df['col3']==0) | (df['col4']==0) |
(df['col5']==0) | (df['col6']==0) | (df['col7']==0) | (df['col8']==0) |
(df['col9']==0) | (df['col10']==0) | (df['col11']==0) ]

When I apply this logic the result is empty but I know some of the values are zero当我应用此逻辑时，结果为空，但我知道有些值为零

All the values of the these column are int64.这些列的所有值都是 int64。

I noticed that 'col11' are all 1's.我注意到'col11'都是1。 When I remove 'col11' or swap the order of the query (eg, putting "| (df['col11']==0)" in the middle )I get the expected results.当我删除 'col11' 或交换查询的顺序时（例如，将“| (df['col11']==0)” 放在中间）我得到了预期的结果。

I wonder if anyone has had this problem or any ideas what's the reason I'm returning an empty df.我想知道是否有人遇到过这个问题或任何想法，我返回空 df 的原因是什么。

Answer 1

Use (df==0).any(axis=1)使用 (df==0).any(axis=1)

df...东...

    a   b   c   d   e   f
0   6   8   7  19   3  14
1  14  19   3  13  10  10
2   6  18  16   0  15  12
3  19   4  14   3   8   3
4   4  14  15   1   6  11

>>> (df==0).any(axis=1)
0    False
1    False
2     True
3    False
4    False
>>> #subset of the columns
>>> (df[['a','c','e']]==0).any(axis=1)
0    False
1    False
2    False
3    False
4    False
dtype: bool

If the DataFrame is all integers you can make use of the fact that zero is falsey and use如果 DataFrame 都是整数，您可以利用零是错误的事实并使用

~df.all(axis=1)

To make fake data制作虚假数据

import numpy as np
import pandas as pd
rng = np.random.default_rng()
nrows = 5
df = pd.DataFrame(rng.integers(0,20,(nrows,6)),columns=['a', 'b', 'c', 'd','e','f'])

为什么不能根据多个或条件在 python pandas 数据框中选择数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-06-27 17:24:51

为什么不能根据多个或条件在 python pandas 数据框中选择数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-06-27 17:24:51

解决方案1
0 已采纳 2022-06-27 17:24:51