简体   繁体   English

在 Python 中的 DataFrame 的描述栏中找到确切的单词

[英]Finding exact word in description column of DataFrame in Python

My list contains some words like: [ 'orange', 'cool', 'app'....] and I want to output all these exact whole words (if available) from a description column in a DataFrame.我的列表包含一些词,例如:[ 'orange', 'cool', 'app'....]我想从 DataFrame 的描述栏中输入 output 所有这些完整的词(如果有的话)。

I have also attached a sample picture with code.我还附上了带有代码的示例图片。 I used str.findall() The picture shows, it extracts add from additional , app from apple .我使用str.findall()所示,它从additional中提取add ,从apple中提取app However, I do not want that.但是,我不想那样。 It should only output if it matches the whole word.如果它匹配整个单词,它应该只有 output。 在此处输入图像描述

You can fix the code using您可以使用修复代码

df['exactmatch'] = df['text'].str.findall(fr"\b({'|'.join(list1)})\b").str.join(", ")

Or, if there can be special chars in your list1 words,或者,如果您的list1单词中可以有特殊字符,

df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")

The pattern created by fr"\b({'|'.join(list1)})\b" and fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)" will look like fr"\b({'|'.join(list1)})\b"fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"创建的模式fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"看起来像

\b(orange|cool|app)\b
(?<!\w)(orange|cool|app)(?!\w)

See the regex demo .请参阅正则表达式演示 Note .str.join(", ") is considered faster than .apply(", ".join) .注意.str.join(", ")被认为比.apply(", ".join)更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM