在 Python 中的 DataFrame 的描述栏中找到确切的单词

Question

My list contains some words like: [ 'orange', 'cool', 'app'....] and I want to output all these exact whole words (if available) from a description column in a DataFrame.我的列表包含一些词，例如：[ 'orange', 'cool', 'app'....]我想从 DataFrame 的描述栏中输入 output 所有这些完整的词（如果有的话）。

I have also attached a sample picture with code.我还附上了带有代码的示例图片。 I used str.findall() The picture shows, it extracts add from additional , app from apple .我使用str.findall()所示，它从additional中提取add ，从apple中提取app 。 However, I do not want that.但是，我不想那样。 It should only output if it matches the whole word.如果它匹配整个单词，它应该只有 output。

Answer 1

You can fix the code using您可以使用修复代码

df['exactmatch'] = df['text'].str.findall(fr"\b({'|'.join(list1)})\b").str.join(", ")

Or, if there can be special chars in your list1 words,或者，如果您的list1单词中可以有特殊字符，

df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")

The pattern created by fr"\b({'|'.join(list1)})\b" and fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)" will look like fr"\b({'|'.join(list1)})\b"和fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"创建的模式fr"(?<.\w)({'|'.join(map(re,escape? list1))})(?!\w)"看起来像

\b(orange|cool|app)\b
(?<!\w)(orange|cool|app)(?!\w)

See the regex demo .请参阅正则表达式演示。 Note .str.join(", ") is considered faster than .apply(", ".join) .注意.str.join(", ")被认为比.apply(", ".join)更快。

在 Python 中的 DataFrame 的描述栏中找到确切的单词

问题描述

1 个解决方案

解决方案1
1 2020-10-09 17:03:57

在 Python 中的 DataFrame 的描述栏中找到确切的单词

问题描述

1 个解决方案

解决方案1 1 2020-10-09 17:03:57

解决方案1
1 2020-10-09 17:03:57