简体   繁体   中英

Word vectors example issue in spacy

from spacy.en import English
from numpy import dot
from numpy.linalg import norm

parser = English()

# you can access known words from the parser's vocabulary
nasa = parser.vocab['NASA']

# cosine similarity
cosine = lambda v1, v2: dot(v1, v2) / (norm(v1) * norm(v2))

# gather all known words, take only the lowercased versions
allWords = list({w for w in parser.vocab if w.has_repvec and w.orth_.islower() and w.lower_ != "nasa"})

# sort by similarity to NASA
allWords.sort(key=lambda w: cosine(w.repvec, nasa.repvec))
allWords.reverse()
print("Top 10 most similar words to NASA:")
for word in allWords[:10]:   
    print(word.orth_)

I am trying to run the above example, but am getting the errors below:

Traceback (most recent call last):
File "C:\Users\bulusu.kiran\Documents\WORK\nlp\wordVectors1.py", line 8, in <module>
nasa = parser.vocab['NASA']
File "spacy/vocab.pyx", line 330, in spacy.vocab.Vocab.__getitem__ (spacy/vocab.cpp:7708)
orth = id_or_string TypeError: an integer is required

Example taken from: Intro to NLP with spaCy

What is causing this error?

What version of Python are you using? This might be the result of a Unicode error; I got it to work in Python 2.7 by replacing

nasa = parser.vocab['NASA']

with

nasa = parser.vocab[u'NASA']

You'll then get this error:

AttributeError: 'spacy.lexeme.Lexeme' object has no attribute 'has_repvec'

There's a similar issue on the SpaCy repo , but these can both be fixed by replacing has_repvec with has_vector and repvec with vector . I'll also comment on that GitHub thread as well.

Complete, updated code I used:

import spacy

from numpy import dot
from numpy.linalg import norm

parser = spacy.load('en')
nasa = parser.vocab[u'NASA']

# cosine similarity
cosine = lambda v1, v2: dot(v1, v2) / (norm(v1) * norm(v2))

# gather all known words, take only the lowercased versions
allWords = list({w for w in parser.vocab if w.has_vector and w.orth_.islower() and w.lower_ != "nasa"})

# sort by similarity to NASA
allWords.sort(key=lambda w: cosine(w.vector, nasa.vector))
allWords.reverse()
print("Top 10 most similar words to NASA:")
for word in allWords[:10]:
    print(word.orth_)

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM