从列表中删除自定义词（第二部分）- Python

Question

我有一个 df 这样的：

df = pd.DataFrame({'PageNumber': [175, 162, 576], 'new_tags': [['flower architecture people'], ['hair red bobbles'], ['sweets chocolate shop']})

<OUT>
PageNumber   new_tags
   175       flower architecture people...
   162       hair red bobbles...
   576       sweets chocolate shop...

和另一个 df（它将作为参考 df（见下文））：

top_words= pd.DataFrame({'ID': [1,2,3], 'tag':['flower, people, chocolate']})

<OUT>
   ID      tag
   1       flower
   2       people
   3       chocolate

我正在尝试根据另一个 df 的值删除 df 列表中的值。 我希望获得的 output 是：

<OUT> df
PageNumber   new_tags
   175       flower people
   576       chocolate

我尝试了内部连接方法：根据另一个 dataframe 的列值过滤 dataframe ，但不幸的是没有运气。

所以我求助于标记化两个 df 列中的所有标签，并尝试遍历每个标签并仅保留参考 df 中的值。 目前，它返回空列表...

df['tokenised_new_tags'] = filtered_new["new_tags"].astype(str).apply(nltk.word_tokenize)
topic_words['tokenised_top_words']= topic_words['tag'].astype(str).apply(nltk.word_tokenize)
df['top_word_tokens'] = [[t for t in tok_sent if t in topic_words['tokenised_top_words']] for tok_sent in df['tokenised_new_tags']]

非常感谢任何帮助 - 谢谢！

Answer 1

这个怎么样：

def remove_custom_words(phrase, words_to_remove_list):
    return([ elem for elem in phrase.split(' ') if elem not in words_to_remove_list])


df['new_tags'] = df['new_tags'].apply(lambda x: remove_custom_words(x[0],top_words['tag'].to_list()))

基本上，我为数据集的每一行应用remove_custom_words function。 然后我们过滤top_words['tag']中包含的词

从列表中删除自定义词（第二部分）- Python

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-02 21:27:46

从列表中删除自定义词（第二部分）- Python

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-02 21:27:46

解决方案1
0 已采纳 2022-05-02 21:27:46