[英]Finding exact word in description column of DataFrame in Python
My list contains some words like: [ 'orange', 'cool', 'app'....]
and I want to output all these exact whole words (if available) from a description column in a DataFrame.我的列表包含一些词,例如:[
'orange', 'cool', 'app'....]
我想从 DataFrame 的描述栏中输入 output 所有这些完整的词(如果有的话)。
I have also attached a sample picture with code.我还附上了带有代码的示例图片。 I used
str.findall()
The picture shows, it extracts add
from additional
, app
from apple
.我使用
str.findall()
所示,它从additional
中提取add
,从apple
中提取app
。 However, I do not want that.但是,我不想那样。 It should only output if it matches the whole word.
如果它匹配整个单词,它应该只有 output。
You can fix the code using您可以使用修复代码
df['exactmatch'] = df['text'].str.findall(fr"\b({'|'.join(list1)})\b").str.join(", ")
Or, if there can be special chars in your list1
words,或者,如果您的
list1
单词中可以有特殊字符,
df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")
The pattern created by fr"\b({'|'.join(list1)})\b"
and fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"
will look like fr"\b({'|'.join(list1)})\b"
和fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"
创建的模式fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"
看起来像
\b(orange|cool|app)\b
(?<!\w)(orange|cool|app)(?!\w)
See the regex demo .请参阅正则表达式演示。 Note
.str.join(", ")
is considered faster than .apply(", ".join)
.注意
.str.join(", ")
被认为比.apply(", ".join)
更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.