[英]How make a vector of neighbor words?
我正在研究 NLP 程序。 我想为每个单词制作一个向量,以尽可能地显示其周围的四个邻居。 比如我们有一句话: I go to school every day
。 word school
的向量如下: V = [go, to, every, day]
。 这是简单的条件,我能够做到。 但是当涉及到位于句子开头或结尾的单词时,它并没有给我想要的结果。 例如,对于单词I
,向量应该是这样的: V=[0,0,go,to]
但 output 是这样的: [go,to]
或者对于单词go
,向量应该是: [0,I,to,school]
。 谁能帮我解决这个问题?
xx=[contains some words]
for text in sentences:
text = text.lower().split()
for i in range(len(text)):
token = text[i]
if(token not in xx):
n1 = text[i-2 : i]
n2 = text[i+1: i+1+window_size]
print(n1,n2,n1+n2)
希望这会有所帮助,我只是在检查我左右的单词是否足够!
def get_surrounding(sentence='', word='', window_size=4):
l,r=sentence.split(word)[:2]
l=l.strip().split()[-window_size//2:]
r=r.strip().split()[:window_size//2]
l=['0']*(window_size//2-len(l))+l
r=r+['0']*(window_size//2-len(r))
return l+r
s = 'I go to school every day'
print(get_surrounding(sentence=s, word='I', window_size=4))
print(get_surrounding(sentence=s, word='go', window_size=4))
print(get_surrounding(sentence=s, word='to', window_size=4))
print(get_surrounding(sentence=s, word='school', window_size=4))
print(get_surrounding(sentence=s, word='every', window_size=4))
print(get_surrounding(sentence=s, word='day', window_size=4))
['0', '0', 'go', 'to']
['0', 'I', 'to', 'school']
['I', 'go', 'school', 'every']
['go', 'to', 'every', 'day']
['to', 'school', 'day', '0']
['school', 'every', '0', '0']
您始终可以预处理数据以满足您的需要。
sentence = 'i go to school every day'
def get_neighbors(sentence, num_neighbors):
# Preprocess sentence and fill margins with defaults
default = 0
words = [word.strip() for word in sentence.split(' ')]
total_words = len(words)
margin = num_neighbors // 2
for x in range(margin):
words.insert(0, default)
words.append(default)
ans = []
for i, word in enumerate(words[2:-2]):
i += margin
neighbours = [words[i-2], words[i-1], words[i+1], words[i+2]]
ans.append(neighbours)
return ans
if __name__ == '__main__':
print(sentence)
print(get_neighbors(sentence, 4))
user@Inspiron:~/code/general$ python get_neighbors.py
i go to school every day
[[0, 0, 'go', 'to'], [0, 'i', 'to', 'school'], ['i', 'go', 'school', 'every'], ['go', 'to', 'every', 'day'], ['to', 'school', 'day', 0], ['school', 'every', 0, 0]]
user@Inspiron:~/code/general$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.