簡體   English   中英

如何在Python中標記關鍵字並添加到新列

[英]How to tag keywords and add to new column in Python

我正在嘗試使用下面的代碼提取句子中的標簽,但它返回的是關鍵字。 我錯過了什么? 我怎樣才能 output 所有標簽(而不是關鍵字)用逗號分隔的新列?

s = set(dict_list)
f = lambda x: ', '.join(set([y for y in x.split() if y in s]))
# df['tags'] = df['description_summary'].apply(f)

df['tags'] = df['description_summary'].apply(lambda x: ', '.join(set(x.split()).intersection(s)))
df

這基本上是我在 excel 文件中使用的數據:

    description_summary

0   Long sentence with keywords ball and hot
1   Long sentence with keywords stick, glove, and cold

這是當前(錯誤的)output:

     description_summary                                     keywords instead of tags

0    Long sentence with keywords ball and hot                ball, hot
1    Long sentence with keywords cold, stick, and glove      cold, stick, glove

這是我想要的 output:

     description_summary                                     tags

0    Long sentence with keywords ball and hot                toy, temperature
1    Long sentence with keywords cold, stick, and glove      temperature, toy 

這是關鍵字和標簽的字典('keywords':'tags'):

dict_list = {'Hot': 'Temperature',
 'Cold': 'Temperature',
 'Very cold': 'Temperature',
 'Ball': 'Toy',
 'Glove': 'Toy',
 'Stick': 'Toy'
 }

我怎樣才能 output 在同一個文件的新列中只有標簽(用逗號分隔)?

您可以使用普通的字典索引來返回關聯值,而不是鍵本身。

請注意,我已經根據您的問題編輯了字典列表,以便更輕松地驗證它是否有效,並且您還需要考慮區分大小寫。

df = pd.DataFrame({'description_summary':['Long sentence with keywords ball and hot',
                                          'Long sentence with keywords cold, stick, and glove']})

dict_list = {'Hot': 'Temperature (hot)',
             'Cold': 'Temperature (cold)',
             'Very cold': 'Temperature (very cold)',
             'Ball': 'Toy (ball)',
             'Glove': 'Toy (glove)',
             'Stick': 'Toy (stick)'}

d_lower = {key.lower():value.lower() for key, value in dict_list.items()}

df['tags'] = df['description_summary'].apply(lambda x: ', '.join(
      set([d_lower[y] for y in d_lower.keys() if y in x])
    ))

產生'tags'

0                   temperature (hot), toy (ball)
1    temperature (cold), toy (glove), toy (stick)
Name: tags, dtype: object

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM