简体   繁体   English

检查 Pandas Dataframe 字符串列是否包含数组中给定的所有元素

[英]Check if a pandas Dataframe string column contains all the elements given in an array

I have a dataframe as shown below:我有一个数据框,如下所示:

>>> import pandas as pd
>>> df = pd.DataFrame(data = [['app;',1,2,3],['app; web;',4,5,6],['web;',7,8,9],['',1,4,5]],columns = ['a','b','c','d'])
>>> df
           a  b  c  d
0       app;  1  2  3
1  app; web;  4  5  6
2       web;  7  8  9
3             1  4  5

I have an input array that looks like this: ["app","web"] For each of these values I want to check against a specific column of a dataframe and return a decision as shown below:我有一个如下所示的输入数组: ["app","web"]对于这些值中的每一个,我想检查数据帧的特定列并返回一个决策,如下所示:

>>> df.a.str.contains("app")
0     True
1     True
2    False
3    False

Since str.contains only allows me to look for an individual value, I was wondering if there's some other direct way to determine the same something like:由于str.contains只允许我查找单个值,我想知道是否有其他一些直接的方法来确定相同的值,例如:

 df.a.str.contains(["app","web"]) # Returns TypeError: unhashable type: 'list'

My end goal is not to do an absolute match ( df.a.isin(["app", "web"] ) but rather a 'contains' logic that says return true even if it has those characters present in that cell of data frame.我的最终目标不是进行绝对匹配( df.a.isin(["app", "web"] ),而是一个“包含”逻辑,即使该数据单元格中存在这些字符,也返回 true框架。

Note: I can of course use apply method to create my own function for the same logic such as:注意:我当然可以使用 apply 方法为相同的逻辑创建我自己的函数,例如:

elementsToLookFor = ["app","web"]
df[header] = df.apply(lambda element: all([a in element for a in elementsToLookFor]))

But I am more interested in the optimal algorithm for this and so prefer to use a native pandas function within pandas, or else the next most optimized custom solution.但我对这个的最佳算法更感兴趣,所以更喜欢在 Pandas 中使用原生 Pandas 函数,或者下一个最优化的自定义解决方案。

This should work too:这也应该有效:

l = ["app","web"]
df['a'].str.findall('|'.join(l)).map(lambda x: len(set(x)) == len(l))

also this should work as well:这也应该有效:

pd.concat([df['a'].str.contains(i) for i in l],axis=1).all(axis = 1)

Try with str.get_dummies尝试使用str.get_dummies

df.a.str.replace(' ','').str.get_dummies(';')[['web','app']].all(1)
0    False
1     True
2    False
3    False
dtype: bool

Update更新

df['a'].str.contains(r'^(?=.*web)(?=.*app)')

Update 2: (To ensure case insenstivity doesn't matter and the column dtype is str without which the logic may fail):更新 2:(为了确保不区分大小写,列 dtype 是 str ,否则逻辑可能会失败):

elementList = ['app','web']
for eachValue in elementList:
                    valueString += f'(?=.*{eachValue})'
df[header] = df[header].astype(str).str.lower() #To ensure case insenstivity and the dtype of the column is string
result = df[header].str.contains(valueString)

so many solutions, which one is the most efficient这么多解决方案,哪个最有效

The str.contains -based answers are generally fastest, though str.findall is also very fast on smaller dfs:基于str.contains的答案通常最快,尽管str.findall在较小的 dfs 上也非常快:

时间与 len(df)

values = ['app', 'web']
pattern = ''.join(f'(?=.*{value})' for value in values)

def replace_dummies_all(df):
    return df.a.str.replace(' ', '').str.get_dummies(';')[values].all(1)

def findall_map(df):
    return df.a.str.findall('|'.join(values)).map(lambda x: len(set(x)) == len(values))

def lower_contains(df):
    return df.a.astype(str).str.lower().str.contains(pattern)

def contains_concat_all(df):
    return pd.concat([df.a.str.contains(l) for l in values], axis=1).all(1)

def contains(df):
    return df.a.str.contains(pattern)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas dataframe 检查列是否包含存在于另一列中的字符串 - pandas dataframe check if column contains string that exists in another column 检查字符串是否包含pandas dataframe中同一列的子字符串 - check if string contains sub string from the same column in pandas dataframe 对于 pandas dataframe 中的每一行,检查列是否包含最后 5 行中的字符串 - For every row in a pandas dataframe, check if a column contains a string in in the last 5 rows 检查 pandas 列是否包含列表中的所有元素 - Check if pandas column contains all elements from a list 检查给定列表中的元素是否存在于 DataFrame 的数组列中 - To check if elements in a given list present in array column in DataFrame 如果所有元素都在一组值中,则检查 pandas dataframe 中的列 - Check for a column in pandas dataframe for all elements if they are in a set of values 如何测试pandas数据框字符串列中的哪个单元格包含给定参考字符串的子字符串? - How to test which cell in a pandas dataframe string column contains a substring of a given reference string? Pandas DataFrame-检查A列中的字符串是否包含B列中的完整单词字符串 - Pandas DataFrame - check if string in column A contains full word string in column B 检查Pandas DataFrame单元格是否包含某些字符串 - Check if Pandas DataFrame cell contains certain string 熊猫-检查一个数据帧中的字符串列是否包含来自另一个数据帧的一对字符串 - Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM