簡體   English   中英

即使密鑰存在於 Dict 中,也會出現 Keyerror?

[英]Geting Keyerror even though the Key is present in Dict?

我正在嘗試查找用於評論分析的詞向量 NLP 情感分析,但出現關鍵錯誤,但字典中存在該關鍵,我不知道為什么會收到此錯誤

txt_fname = 'C:\\Users\\arune\\Desktop\\sentiment labelled sentences\\amazon_cells_labelled.txt'
df = pd.read_table(txt_fname,names=['sentence','sentiment'])

df['tokenized'] = df['sentence'].apply(lambda a: word_tokenize(a))


```
vocab = set()
for tokens in df['tokenized']:
    for a in tokens:
        vocab.add(a)
        
len(vocab)
```

vocab = {a:b for a,b in enumerate(sorted(vocab))}
vocab   

rand_wv = np.random.rand(len(vocab),300)
rand_wv.shape

from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2,random_state=42)
X_train = []
for tok_sent in train['tokenized']:
    doc_vec = np.zeros(300)
    for t in tok_sent:
        word_index = vocab[t]
        word_vec = rand_wv[word_index]
        doc_vec += word_vec
    doc_vec = doc_vec/len(tok_sent)
    X_train.append(doc_vec)
    
X_train = np.array(X_train)
y_train = train['sentiment']
X_train.shape

獲取錯誤為:

KeyError                                  Traceback (most recent call last)
<ipython-input-64-add70741d289> in <module>
      3     doc_vec = np.zeros(300)
      4     for t in tok_sent:
----> 5         word_index = vocab[t]
      6         word_vec = rand_wv[word_index]
      7         doc_vec += word_vec

KeyError: 'does'

正如蒂姆羅伯茨在評論中回答的那樣:

你的vocab表是從enumerate創建的,所以它的鍵是從 0 開始的整數。

正如所建議的那樣,您應該像這樣創建詞匯表:

vocab = {w: id_ for id_, w in enumerate(sorted(vocab))}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM