简体   繁体   English

对于 Pandas dataframe 中的每一行,检查行是否包含列表中的字符串

[英]For each row in Pandas dataframe, check if row contains string from list

I have a given list of strings, like that:我有一个给定的字符串列表,如下所示:

List=['plastic', 'carboard', 'wood']

I have a column of dtype string in my dataframe, like that:我的 dataframe 中有一列 dtype 字符串,如下所示:

Column=['beer plastic', 'water cardboard', 'eggs plastic', 'fruits wood']

For each row in the column, I want to know if the row contains a word from the list, and if yes, I want to keep only the text that comes before that word, like that:对于列中的每一行,我想知道该行是否包含列表中的一个单词,如果是,我只想保留该单词之前的文本,如下所示:

New_Column=['beer', 'water', 'eggs', 'fruits']

Is there a way to systematize this for each row of my dataframe (millions of rows)?有没有办法对我的 dataframe (数百万行)的每一行进行系统化? Thanks谢谢

PS. PS。 I have tried building a function with regular expression pattern matching like this我试过用正则表达式模式匹配这样的 function

pattern=re.compile('**Pattern to be defined to include element from list**')

def truncate(row, pattern):
    Column=row['Column']
    if bool(pattern.match(Column)):
        Column=Column.replace(**word from list**,"")
        return Column

df['New_column']=df.apply(truncate,axis=1, pattern=pattern)
##df

      0
0     beer plastic
1  water cardboard
2     eggs plastic
3      fruits wood


l=['plastic', 'cardboard', 'wood']


using str.findall 使用str.findall

 df[0].str.findall('\w+\s*(?=' + '|'.join(l) +')').apply(lambda x: x[0].strip() if len(x) else 'NotFound') ##output 0 beer 1 water 2 eggs 3 fruits Name: 0, dtype: object
import pandas as pd
...
for index, row in df.iterrows():
    for word in List_name:
        row['Column_name'] = row['Column_name'].partition(word)[0] if (word in row['Column_name']) else row['Column_name']

If you want to run a working example:如果你想运行一个工作示例:

import pandas as pd

List=['plastic', 'carboard', 'wood']
df = pd.DataFrame([{'c1':"fun carboard", 'c2':"jolly plastic"}, {'c1':"meh wood",'c2':"aba"}, {'c1':"aaa",'c2':"bbb"}, {'c1':"old wood",'c2':"bbb"}])

for index, row in df.iterrows():
    for word in List:
        row['c1'] = row['c1'].partition(word)[0] if (word in row['c1']) else row['c1']
        row['c2'] = row['c2'].partition(word)[0] if (word in row['c2']) else row['c2']
df

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查 pandas 系列中的每一行是否包含使用 apply 的列表中的字符串? - Check if each row in a pandas series contains a string from a list using apply? 检查熊猫数据帧的一列是否包含不同列的每一行的子字符串? - Check if a column of a pandas dataframe contains a substring for each row of a different column? 删除pandas DataFrame中的行,其中行包含列表中的字符串? - Removing rows in a pandas DataFrame where the row contains a string present in a list? 从字典列表中创建 Pandas DataFrame? 每个字典在 DataFrame 中作为行? - Creating a Pandas DataFrame from list of dictionaries? Each dictionary as row in DataFrame? 如何检查一个单词是否在 pandas dataframe 的每一行中 - How to check if a word is in each row of a pandas dataframe 检查 pandas dataframe 是否包含项目列表中的特定字符串 - Check if pandas dataframe contains specific string from a list of items 列表相对于Pandas数据框中每一行的出现频率 - Occurence frequency from a list against each row in Pandas dataframe 从 pandas dataframe 中每一行的字符串列表中删除空字符串 - Remove empty strings from a list of strings on each row in a pandas dataframe Python pandas:从 dataframe 的每一行构造列表数据类对象 - Python pandas: construct list dataclass objects from each row of a dataframe Python 2.7 / Pandas:从数据框中的每一行写入新字符串 - Python 2.7 / Pandas: writing new string from each row in dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM