简体   繁体   English

最简洁的方法是 select 行,其中任何列包含 Pandas dataframe 中的字符串?

[英]Most concise way to select rows where any column contains a string in Pandas dataframe?

What is the most concise way to select all rows where any column contains a string in a Pandas dataframe? select 中任何列包含 Pandas dataframe 中的字符串的所有行的最简洁方法是什么?

For example, given the following dataframe what is the best way to select those rows where the value in any column contains a b ?例如,给定以下 dataframe 到 select 任何列中的值包含b的那些行的最佳方法是什么?

df = pd.DataFrame({
    'x': ['foo', 'foo', 'bar'],
    'y': ['foo', 'foo', 'foo'],
    'z': ['foo', 'baz', 'foo']
})

I'm inexperienced with Pandas and the best I've come up with so far is the rather cumbersome df[df.apply(lambda r: r.str.contains('b').any(), axis=1)] .我对 Pandas 没有经验,到目前为止我想出的最好的是相当麻烦的df[df.apply(lambda r: r.str.contains('b').any(), axis=1)] . Is there a simpler solution?有更简单的解决方案吗?

Critically, I want to check for a match in any columns, not a particular column.至关重要的是,我想检查任何列中的匹配项,而不是特定列中的匹配项。 Other similar questions, as best I can tell, only address a single or list of columns.据我所知,其他类似的问题只针对单个列或列列表。

This question was not given an answer.. but the question itself and the comments has got the answer already which worked really well for me.. and I didn't find the answer anywhereelse I looked.这个问题没有得到答案..但是问题本身和评论已经得到了答案,这对我来说非常有效..我在其他任何地方都找不到答案。

So I just copy pasted the answer for someone who can find it useful.所以我只是将答案复制粘贴给那些觉得它有用的人。 I added case=False for a case insensitive serach我为不区分大小写的搜索添加了 case=False

Solution from @Reason: @Reason 的解决方案:

the best I've come up with so far is the rather cumbersome到目前为止我想出的最好的是相当麻烦

this one worked for me.这个对我有用。

df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)] 

Solution from @rbinnun: @rbinnun 的解决方案:

this one worked for me for a test dataset.. but for some real data set.. it returned a unicode error as below, but generally a good solution too I think这个对我来说是一个测试数据集..但是对于一些真实的数据集..它返回了一个如下的unicode错误,但我认为通常也是一个很好的解决方案

df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]

takes care of non-string columns, nans, etc.处理非字符串列、nans 等。

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 5: ordinal not in range(128)

df.apply is too slow when working with lots (millions) of rows. df.apply在处理大量(数百万)行时太慢了。 Look for something else.寻找别的东西。

If you don't like apply:如果你不喜欢申请:

df.stack()[df.stack().str.contains("b")]

returns回报

1  z    baz
2  x    bar
dtype: object

and like above with similar to original table properties:和上面类似的原始表属性:

df.stack()[df.stack().str.contains("b")].reset_index().pivot(index="level_0", columns="level_1").droplevel(0, 1)
level_1 1级 x X z z
1 1个 NaN钠盐 baz巴兹
2 2个 bar酒吧 NaN钠盐

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 数据框选择列表列包含任何字符串列表的行 - Pandas dataframe select rows where a list-column contains any of a list of strings 在 Pandas DataFrame 中以最简洁的方式将字符串转换为日期 - Converting string to date in the most concise way in a Pandas DataFrame 从 Pandas 数据框中选择特定列包含数字的行 - Select rows from Pandas dataframe where a specific column contains numbers 熊猫-在ANY列中选择包含某个正则表达式的数据框的行 - Pandas - Select rows of a dataframe that contains a certain regex in ANY column Pandas dataframe - 选择一列的值包含字符串,另一列的值以特定字符串开头的行 - Pandas dataframe - Select rows where one column's values contains a string and another column's values starts with specific strings pandas dataframe函数返回日期最近的行,并且其中一列包含输入值,抛出错误 - pandas dataframe function to return rows where date is most recent and one of the column contains the input value, throwing error 如何从熊猫数据框中删除行,其中任何列都包含我不想要的符号 - How to drop rows from a pandas dataframe where any column contains a symbol I don't want 删除pandas DataFrame中的行,其中行包含列表中的字符串? - Removing rows in a pandas DataFrame where the row contains a string present in a list? Pandas:删除任何列包含某个子字符串的所有行 - Pandas: Remove all rows where any of the column contains a certain substring 如果列包含任何指定的部分字符串,Pandas Dataframe保留行 - Pandas Dataframe Keep Row If Column Contains Any Designated Partial String
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM