[英]Geting Keyerror even though the Key is present in Dict?
我正在嘗試查找用於評論分析的詞向量 NLP 情感分析,但出現關鍵錯誤,但字典中存在該關鍵,我不知道為什么會收到此錯誤
txt_fname = 'C:\\Users\\arune\\Desktop\\sentiment labelled sentences\\amazon_cells_labelled.txt'
df = pd.read_table(txt_fname,names=['sentence','sentiment'])
df['tokenized'] = df['sentence'].apply(lambda a: word_tokenize(a))
```
vocab = set()
for tokens in df['tokenized']:
for a in tokens:
vocab.add(a)
len(vocab)
```
vocab = {a:b for a,b in enumerate(sorted(vocab))}
vocab
rand_wv = np.random.rand(len(vocab),300)
rand_wv.shape
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2,random_state=42)
X_train = []
for tok_sent in train['tokenized']:
doc_vec = np.zeros(300)
for t in tok_sent:
word_index = vocab[t]
word_vec = rand_wv[word_index]
doc_vec += word_vec
doc_vec = doc_vec/len(tok_sent)
X_train.append(doc_vec)
X_train = np.array(X_train)
y_train = train['sentiment']
X_train.shape
獲取錯誤為:
KeyError Traceback (most recent call last)
<ipython-input-64-add70741d289> in <module>
3 doc_vec = np.zeros(300)
4 for t in tok_sent:
----> 5 word_index = vocab[t]
6 word_vec = rand_wv[word_index]
7 doc_vec += word_vec
KeyError: 'does'
正如蒂姆羅伯茨在評論中回答的那樣:
你的
vocab
表是從enumerate
創建的,所以它的鍵是從 0 開始的整數。
正如所建議的那樣,您應該像這樣創建詞匯表:
vocab = {w: id_ for id_, w in enumerate(sorted(vocab))}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.