使用python將句子中的每個單詞替換為單詞索引

Question

我有兩個 csv 文件，其中一個包含如下所示的句子：

                         sentences
0  yes good bye how should and bye
1                       bye should
2                         good bye

和另一個 csv，其中包含每個單詞及其旁邊的索引，如圖所示：

     word  frequency  index
0     and        500     10
1     you        334      1
2     how        320      2
3  should        250      3
4     yes        100      4
5     bye         50      5
6    good          1      6

我正在嘗試使用 Dictionary 作為我的問題的解決方案，但它只為一個單詞而不是整個句子打印奇怪的輸出

import string
import pandas as pd
text=pd.read_csv("one.csv")

change=pd.read_csv("result.csv")
print(text)
update = dict(zip(change.word, change.index))
print(update)
text1 = text['sentences'].replace(update, regex=True)
print(text1)
text1.to_csv('yes.csv', header=False, index=False)

我希望輸出是：

4 6 5 2 3 10 5

5 3

6 5

我得到了這個輸出：

我做錯了什么任何解決方案？

Answer 1

拆分每一行后，您可以對所有項目使用帶有series.get的列表理解：

s=df2.set_index('word')['index']
final=df1.assign(index=[[s.get(a) for a in i.split()] for i in df1['sentences']])

                         sentences                   index
0  yes good bye how should and bye  [4, 6, 5, 2, 3, 10, 5]
1                       bye should                  [5, 3]
2                         good bye                  [6, 5]

Answer 2

我們可以使用一個系列來替換，另一方面，關鍵似乎是使用Series.astype將系列轉換為 str ：

text['index']=text.sentences.replace(change.set_index('word')['index']
                                           .astype(str),
                                     regex = True)
print(text)
#text.sentences.replace(change.set_index('word')['index'],regex = True)
#0    10
#1     3
#2     5
#Name: sentences, dtype: int64

輸出

                         sentences           index
0  yes good bye how should and bye  4 6 5 2 3 10 5
1                       bye should             5 3
2                         good bye             6 5

使用python將句子中的每個單詞替換為單詞索引

問題描述

2 個解決方案

解決方案1
2 已采納 2019-12-25 12:25:04

解決方案2
2 2019-12-25 12:30:52

使用python將句子中的每個單詞替換為單詞索引

問題描述

2 個解決方案

解決方案1 2 已采納 2019-12-25 12:25:04

解決方案2 2 2019-12-25 12:30:52

解決方案1
2 已采納 2019-12-25 12:25:04

解決方案2
2 2019-12-25 12:30:52