简体   繁体   English

检查 pandas 中是否至少有一列包含字符串

[英]Check if at least one column contains a string in pandas

I would like to check whether several columns contain a string, and generate a Boolean column with the result.我想检查几列是否包含一个字符串,并用结果生成一个 Boolean 列。 This is easy to do for a single column, but generates an Attribute Error ( AttributeError: 'DataFrame' object has no attribute 'str' ) when this method is applied to multiple columns.这对于单列很容易做到,但是当将此方法应用于多列时会产生属性错误( AttributeError: 'DataFrame' object has no attribute 'str' )。

Example:例子:

import pandas as pd

c1=[x+'x' for x in 'abcabc']
c2=['Y'+x+'m' for x in 'CABABC']
cols=['A','B']

df=pd.DataFrame(list(zip(c1,c2)),columns=cols)
df

Returns:回报:

    A   B
0   ax  YCm
1   bx  YAm
2   cx  YBm
3   ax  YAm
4   bx  YBm
5   cx  YCm

The following code works when applied to a single column, but does not work when applied to several columns.以下代码在应用于单个列时有效,但在应用于多个列时无效。 I'd like something that fits in here and gives the desired result:我想要一些适合这里并给出预期结果的东西:

df['C']=df[cols].str.contains('c',case=False)

Thus the desired output is:因此所需的 output 是:

    A   B   C
0   ax  YCm True
1   bx  YAm False
2   cx  YBm True
3   ax  YAm False
4   bx  YBm False
5   cx  YCm True

Edit: I updated my example to reflect the desire to actually search for whether the column "contains" a value, rather than "is equivalent to" that value.编辑:我更新了我的示例以反映实际搜索列是否“包含”一个值而不是“等于”该值的愿望。

Edit: in terms of timings, here's the benchmark I'd like to be able to match or beat, without creating the new columns (using a *1000 to the columns in my toy example):编辑:就时间而言,这是我希望能够匹配或击败的基准,而无需创建新列(在我的玩具示例中对列使用*1000 ):

newcols=['temp_'+x for x in cols]

for col in cols:
    df['temp_'+col]=df[col].str.contains('c',case=False)

df['C']=df[newcols].any(axis=1)
df=df[['A','B','C']]

An option via applymap :通过applymap的一个选项:

df['C'] = df.applymap(lambda x: 'c' in str(x).lower()).any(1)

Via stack/unstack :通过stack/unstack

df['C'] = df.stack().str.contains('c', case=False).unstack().any(1)
df['C'] = df.stack().str.lower().str.contains('c').unstack().any(1)

OUTPUT: OUTPUT:

    A    B      C
0  ax  YCm   True
1  bx  YAm  False
2  cx  YBm   True
3  ax  YAm  False
4  bx  YBm  False
5  cx  YCm   True

I would run an apply across the columns and take the any() of those:我会跨列运行应用程序并获取其中的any()

df['C']=df.apply(lambda y: y.str.contains('c',case=False),1).any(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查字符串是否包含列表中的至少一个字符串 - Check if a string contains at least one of the strings in a list 熊猫-检查字符串列是否包含一对字符串 - Pandas - check if a string column contains a pair of strings Pandas - 检查列是否包含字符串的子字符串 - Pandas - Check if a column contains a substring of a string 如果列字符串包含 X 和 [Y,Z] 中的至少一个,则捕获行 - Capturing row if column string contains X and at least one of [Y,Z] 熊猫-检查一个数据帧中的字符串列是否包含来自另一个数据帧的一对字符串 - Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe 熊猫高效检查列是否在其他列中包含字符串 - Pandas efficient check if column contains string in other column pandas dataframe 检查列是否包含存在于另一列中的字符串 - pandas dataframe check if column contains string that exists in another column 检查字符串是否包含pandas dataframe中同一列的子字符串 - check if string contains sub string from the same column in pandas dataframe 在Pandas DataFrame中解析列,其中一列包含嵌套的JSON字符串 - Parsing Column in Pandas DataFrame with one column that contains a nested JSON string 检查字符串在python中是否至少包含五个字符 - Check if a string contains at least five characters in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM