最简洁的方法是 select 行，其中任何列包含 Pandas dataframe 中的字符串？

Question

What is the most concise way to select all rows where any column contains a string in a Pandas dataframe? select 中任何列包含 Pandas dataframe 中的字符串的所有行的最简洁方法是什么？

For example, given the following dataframe what is the best way to select those rows where the value in any column contains a b ?例如，给定以下 dataframe 到 select 任何列中的值包含b的那些行的最佳方法是什么？

df = pd.DataFrame({
    'x': ['foo', 'foo', 'bar'],
    'y': ['foo', 'foo', 'foo'],
    'z': ['foo', 'baz', 'foo']
})

I'm inexperienced with Pandas and the best I've come up with so far is the rather cumbersome df[df.apply(lambda r: r.str.contains('b').any(), axis=1)] .我对 Pandas 没有经验，到目前为止我想出的最好的是相当麻烦的df[df.apply(lambda r: r.str.contains('b').any(), axis=1)] . Is there a simpler solution?有更简单的解决方案吗？

Critically, I want to check for a match in any columns, not a particular column.至关重要的是，我想检查任何列中的匹配项，而不是特定列中的匹配项。 Other similar questions, as best I can tell, only address a single or list of columns.据我所知，其他类似的问题只针对单个列或列列表。

Answer 1

This question was not given an answer.. but the question itself and the comments has got the answer already which worked really well for me.. and I didn't find the answer anywhereelse I looked.这个问题没有得到答案..但是问题本身和评论已经得到了答案，这对我来说非常有效..我在其他任何地方都找不到答案。

So I just copy pasted the answer for someone who can find it useful.所以我只是将答案复制粘贴给那些觉得它有用的人。 I added case=False for a case insensitive serach我为不区分大小写的搜索添加了 case=False

Solution from @Reason: @Reason 的解决方案：

the best I've come up with so far is the rather cumbersome到目前为止我想出的最好的是相当麻烦

this one worked for me.这个对我有用。

df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)]

Solution from @rbinnun: @rbinnun 的解决方案：

this one worked for me for a test dataset.. but for some real data set.. it returned a unicode error as below, but generally a good solution too I think这个对我来说是一个测试数据集..但是对于一些真实的数据集..它返回了一个如下的unicode错误，但我认为通常也是一个很好的解决方案

df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]

takes care of non-string columns, nans, etc.处理非字符串列、nans 等。

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 5: ordinal not in range(128)

Answer 2

df.apply is too slow when working with lots (millions) of rows. df.apply在处理大量（数百万）行时太慢了。 Look for something else.寻找别的东西。

Answer 3

If you don't like apply:如果你不喜欢申请：

df.stack()[df.stack().str.contains("b")]

returns回报

1  z    baz
2  x    bar
dtype: object

and like above with similar to original table properties:和上面类似的原始表属性：

df.stack()[df.stack().str.contains("b")].reset_index().pivot(index="level_0", columns="level_1").droplevel(0, 1)

level_1 1级	x X	z z
1 1个	NaN钠盐	baz巴兹
2 2个	bar酒吧	NaN钠盐

最简洁的方法是 select 行，其中任何列包含 Pandas dataframe 中的字符串？

问题描述

2 个解决方案

解决方案1
25 2017-03-25 15:30:33

解决方案2
0 2020-12-23 09:21:36

解决方案3
0 2023-01-06 17:36:00

最简洁的方法是 select 行，其中任何列包含 Pandas dataframe 中的字符串？

问题描述

2 个解决方案

解决方案1 25 2017-03-25 15:30:33

解决方案2 0 2020-12-23 09:21:36

解决方案3 0 2023-01-06 17:36:00

解决方案1
25 2017-03-25 15:30:33

解决方案2
0 2020-12-23 09:21:36

解决方案3
0 2023-01-06 17:36:00