[英]Most concise way to select rows where any column contains a string in Pandas dataframe?
What is the most concise way to select all rows where any column contains a string in a Pandas dataframe? select 中任何列包含 Pandas dataframe 中的字符串的所有行的最简洁方法是什么?
For example, given the following dataframe what is the best way to select those rows where the value in any column contains a b
?例如,给定以下 dataframe 到 select 任何列中的值包含
b
的那些行的最佳方法是什么?
df = pd.DataFrame({
'x': ['foo', 'foo', 'bar'],
'y': ['foo', 'foo', 'foo'],
'z': ['foo', 'baz', 'foo']
})
I'm inexperienced with Pandas and the best I've come up with so far is the rather cumbersome df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]
.我对 Pandas 没有经验,到目前为止我想出的最好的是相当麻烦的
df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]
. Is there a simpler solution?有更简单的解决方案吗?
Critically, I want to check for a match in any columns, not a particular column.至关重要的是,我想检查任何列中的匹配项,而不是特定列中的匹配项。 Other similar questions, as best I can tell, only address a single or list of columns.
据我所知,其他类似的问题只针对单个列或列列表。
This question was not given an answer.. but the question itself and the comments has got the answer already which worked really well for me.. and I didn't find the answer anywhereelse I looked.这个问题没有得到答案..但是问题本身和评论已经得到了答案,这对我来说非常有效..我在其他任何地方都找不到答案。
So I just copy pasted the answer for someone who can find it useful.所以我只是将答案复制粘贴给那些觉得它有用的人。 I added case=False for a case insensitive serach
我为不区分大小写的搜索添加了 case=False
Solution from @Reason: @Reason 的解决方案:
the best I've come up with so far is the rather cumbersome到目前为止我想出的最好的是相当麻烦
this one worked for me.这个对我有用。
df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)]
Solution from @rbinnun: @rbinnun 的解决方案:
this one worked for me for a test dataset.. but for some real data set.. it returned a unicode error as below, but generally a good solution too I think这个对我来说是一个测试数据集..但是对于一些真实的数据集..它返回了一个如下的unicode错误,但我认为通常也是一个很好的解决方案
df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]
takes care of non-string columns, nans, etc.处理非字符串列、nans 等。
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 5: ordinal not in range(128)
df.apply
is too slow when working with lots (millions) of rows. df.apply
在处理大量(数百万)行时太慢了。 Look for something else.寻找别的东西。
If you don't like apply:如果你不喜欢申请:
df.stack()[df.stack().str.contains("b")]
returns回报
1 z baz
2 x bar
dtype: object
and like above with similar to original table properties:和上面类似的原始表属性:
df.stack()[df.stack().str.contains("b")].reset_index().pivot(index="level_0", columns="level_1").droplevel(0, 1)
level_1 ![]() |
x ![]() |
z ![]() |
---|---|---|
1 ![]() |
NaN![]() |
baz![]() |
2 ![]() |
bar![]() |
NaN![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.