如何在Python中標記關鍵字並添加到新列

Question

我正在嘗試使用下面的代碼提取句子中的標簽，但它返回的是關鍵字。 我錯過了什么？ 我怎樣才能 output 所有標簽（而不是關鍵字）用逗號分隔的新列？

s = set(dict_list)
f = lambda x: ', '.join(set([y for y in x.split() if y in s]))
# df['tags'] = df['description_summary'].apply(f)

df['tags'] = df['description_summary'].apply(lambda x: ', '.join(set(x.split()).intersection(s)))
df

這基本上是我在 excel 文件中使用的數據：

    description_summary

0   Long sentence with keywords ball and hot
1   Long sentence with keywords stick, glove, and cold

這是當前（錯誤的）output：

     description_summary                                     keywords instead of tags

0    Long sentence with keywords ball and hot                ball, hot
1    Long sentence with keywords cold, stick, and glove      cold, stick, glove

這是我想要的 output：

     description_summary                                     tags

0    Long sentence with keywords ball and hot                toy, temperature
1    Long sentence with keywords cold, stick, and glove      temperature, toy

這是關鍵字和標簽的字典（'keywords'：'tags'）：

dict_list = {'Hot': 'Temperature',
 'Cold': 'Temperature',
 'Very cold': 'Temperature',
 'Ball': 'Toy',
 'Glove': 'Toy',
 'Stick': 'Toy'
 }

我怎樣才能 output 在同一個文件的新列中只有標簽（用逗號分隔）？

Answer 1

您可以使用普通的字典索引來返回關聯值，而不是鍵本身。

請注意，我已經根據您的問題編輯了字典列表，以便更輕松地驗證它是否有效，並且您還需要考慮區分大小寫。

df = pd.DataFrame({'description_summary':['Long sentence with keywords ball and hot',
                                          'Long sentence with keywords cold, stick, and glove']})

dict_list = {'Hot': 'Temperature (hot)',
             'Cold': 'Temperature (cold)',
             'Very cold': 'Temperature (very cold)',
             'Ball': 'Toy (ball)',
             'Glove': 'Toy (glove)',
             'Stick': 'Toy (stick)'}

d_lower = {key.lower():value.lower() for key, value in dict_list.items()}

df['tags'] = df['description_summary'].apply(lambda x: ', '.join(
      set([d_lower[y] for y in d_lower.keys() if y in x])
    ))

產生'tags'

0                   temperature (hot), toy (ball)
1    temperature (cold), toy (glove), toy (stick)
Name: tags, dtype: object

如何在Python中標記關鍵字並添加到新列

問題描述

1 個解決方案

解決方案1
0 2021-06-16 19:26:06

如何在Python中標記關鍵字並添加到新列

問題描述

1 個解決方案

解決方案1 0 2021-06-16 19:26:06

解決方案1
0 2021-06-16 19:26:06