简体   繁体   English

熊猫:为几列选择有条件的行

[英]Pandas: Select Rows with condition for several columns

I'm using this to conditionally select rows of column : 我正在使用它有条件地选择column行:

X.loc[data['column'] == 1]

But I want to expand this condition to several columns. 但我想将此条件扩展到几列。 These columns have something in common: They contain a same string. 这些列有一些共同点:它们包含相同的字符串。 So actually I have a column1 , a column2 , ... , column100 etc. and this condition should apply to all of these columns. 所以实际上我有column1column2 ,..., column100等,并且此条件应适用于所有这些列。 Actually something like this (wildcard): 实际上是这样的(通配符):

X.loc[data['column*'] == 1]

These conditions should be linked with OR . 这些条件应与OR关联。 Any chance to do this easily? 有机会轻松做到这一点吗?

For some dataframe X 对于某些数据框X

   p A  p B  p C
0    0    0    0
1    0    0    0
2    0    0    1
3    0    0    0
4    0    0    0
5    0    0    0
6    1    0    0

If you can set up the names of the columns you want to test for in col_list 如果可以在col_list设置要测试的列的名称

col_list = X.columns

You can then use np.any() to test with or between each: 然后,您可以使用np.any()在每个对象之间或之间进行测试:

X.loc[(X[col_list] == 1).any(axis=1)]

Which gives you: 这给你:

   p A  p B  p C
2    0    0    1
6    1    0    0

Informed you don't need loc and will still get the same answer, credit to @MaartynFabre for the info 通知您不需要loc并且仍然会得到相同的答案,有关信息请致@MaartynFabre

X[(X[col_list] == 1).any(axis=1)]

   p A  p B  p C
2    0    0    1
6    1    0    0

test Dataframe 测试数据框

    col0 col1 col2
0   1    1    2
1   1    1    1
2   2    2    2

make a new dataframe with the test for all columns 用所有列的测试创建一个新的数据框

result_s = d.concat((df['col%i'%i] == 1 for i in range(3)), axis=1).all(axis=1)

results in 结果是

0    False
1     True
2    False
dtype: bool

if you do df[result_s] you get 如果你做df[result_s]你会得到

    col0 col1 col2
1   1    1    1

this selects the rows where all columns are ==1 If one of the is enough, change the .all() to .any 这将选择所有列均为==1的行。如果其中之一足够, .any .all()更改为.any

    col0 col1 col2
0   1    1    2
1   1    1    1

Put each comparison in brackets and combine them with logical operators: 将每个比较放在方括号中,并将它们与逻辑运算符组合:

pd.DataFrame(X).loc[(data['col1']==23) & (data['col2']==42)] # and
pd.DataFrame(X).loc[(data['col1']==23) | (data['col2']==42)] # or

Here's another way to consider: 这是另一种考虑方式:

df
   col0  col1  col2
0     1     1     2
1     1     1     1
2     2     2     2

df.loc[df['col0'] == 1, [x for x in df.columns if x == 'col0']]
   col0
0     1
1     1

You can use list comprehension to find the columns you're looking for. 您可以使用列表推导来查找所需的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM