简体   繁体   English

检查 pandas dataframe 是否包含项目列表中的特定字符串

[英]Check if pandas dataframe contains specific string from a list of items

I have a list我有一个清单

my_list = ['element1 line','element2 ','element3', 'element4 line',....]

and I have a pandas dataframe having df [ Sentences ] column and df['flag'] column我有一个 pandas dataframe 有df [ Sentences ] 列和df['flag']

df
    Sentences               flag
0   abcd    
1   efgh    
2   element1 ijkl           
3   mnop element3 element4      
4   qrst

I want to iterate to each and every row of dataframe of column Sentences .我想迭代到列Sentences的 dataframe 的每一行。 If any of the elements in my_list is present in the Sentences , df['flag'] column should be 1 in the respective row.如果 my_list 中的任何元素出现在Sentences中,则相应行中的df['flag']列应为 1。 If no elements is present in the string of sentences in that row, df['flag'] should be 0 for that row.如果该行的句子字符串中不存在任何元素,则该行的df['flag']应该为 0。

Expected output:预期 output:

df
    Sentences                flag
0   abcd                      0
1   efgh                      0
2   element1 ijkl             1 
3   mnop element3 element4    1     
4   qrst                      0

You need to use a loop:您需要使用循环:

df['flag'] = [int(any(w in my_list for w in x.split())) for x in df['Sentences']]

output: output:

                Sentences  flag
0                    abcd     0
1                    efgh     0
2           element1 ijkl     1
3  mnop element3 element4     1
4                    qrst     0

Note that you could use pure pandas, but this is much slower:请注意,您可以使用纯 pandas,但这慢得多:

df['flag'] = (df['Sentences']
              .str.split()
              .explode().isin(my_list)
              .groupby(level=0).any().astype(int)
              )

You can also try this without 2 for-loops:您也可以在没有 2 个 for 循环的情况下尝试此操作:

df['flag'] = df['Sentences'].str.split().map(set).
                             apply(lambda x: any(x.intersection(my_list))*1)

Hi it is possible to return to value of list instead of true or false only?嗨,可以返回列表的值而不是仅返回 true 或 false 吗?

Something like:就像是:

0 abcd 0 abcd
1 efgh 1个
2 element1 ijkl element1 3 mnop element3 element4 element3 4 qrst 2 元素 1 ijkl 元素 1 3 mnop 元素 3 元素 4 元素 3 4 qrst

Thank you谢谢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对于 Pandas dataframe 中的每一行,检查行是否包含列表中的字符串 - For each row in Pandas dataframe, check if row contains string from list 检查字符串是否包含pandas dataframe中同一列的子字符串 - check if string contains sub string from the same column in pandas dataframe Python pandas 检查单元格中列表的最后一个元素是否包含特定字符串 - Python pandas check if the last element of a list in a cell contains specific string 检查一列pandas daframe中包含多少项 - Check how many items from a list pandas daframe contains in a column Pandas 检查 dataframe 列是否包含列表中的值(不同长度) - Pandas check if dataframe column contains value from list (different lengths) 从 Pandas DataFrame 中删除名称包含特定字符串的列 - Drop columns whose name contains a specific string from pandas DataFrame 如何从熊猫数据框中选择特定的列项目作为列表? - How to select specific column items as list from pandas dataframe? 检查Pandas DataFrame单元格是否包含某些字符串 - Check if Pandas DataFrame cell contains certain string 如果熊猫数据框中包含特定的子字符串,请替换该字符串 - Replace string in pandas dataframe if it contains specific substring Python Pandas:检查系列是否包含列表中的字符串 - Python Pandas: check if Series contains a string from list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM