繁体   English   中英

如何制作相邻词的向量?

[英]How make a vector of neighbor words?

我正在研究 NLP 程序。 我想为每个单词制作一个向量,以尽可能地显示其周围的四个邻居。 比如我们有一句话: I go to school every day word school的向量如下: V = [go, to, every, day] 这是简单的条件,我能够做到。 但是当涉及到位于句子开头或结尾的单词时,它并没有给我想要的结果。 例如,对于单词I ,向量应该是这样的: V=[0,0,go,to]但 output 是这样的: [go,to]或者对于单词go ,向量应该是: [0,I,to,school] 谁能帮我解决这个问题?

xx=[contains some words]
for text in sentences:
        text = text.lower().split()
        for i in range(len(text)):
            token = text[i]
            if(token not in xx):

                n1 = text[i-2 : i]
                n2 = text[i+1: i+1+window_size]
                print(n1,n2,n1+n2)

希望这会有所帮助,我只是在检查我左右的单词是否足够!

def get_surrounding(sentence='', word='', window_size=4):
    l,r=sentence.split(word)[:2]
    l=l.strip().split()[-window_size//2:]
    r=r.strip().split()[:window_size//2]
    l=['0']*(window_size//2-len(l))+l
    r=r+['0']*(window_size//2-len(r))
    return l+r
s = 'I go to school every day'
print(get_surrounding(sentence=s, word='I', window_size=4))
print(get_surrounding(sentence=s, word='go', window_size=4))
print(get_surrounding(sentence=s, word='to', window_size=4))
print(get_surrounding(sentence=s, word='school', window_size=4))
print(get_surrounding(sentence=s, word='every', window_size=4))
print(get_surrounding(sentence=s, word='day', window_size=4))
['0', '0', 'go', 'to']
['0', 'I', 'to', 'school']
['I', 'go', 'school', 'every']
['go', 'to', 'every', 'day']
['to', 'school', 'day', '0']
['school', 'every', '0', '0']

您始终可以预处理数据以满足您的需要。

sentence = 'i go to school every day'

def get_neighbors(sentence, num_neighbors):
    # Preprocess sentence and fill margins with defaults
    default = 0
    words = [word.strip() for word in sentence.split(' ')]
    total_words = len(words)
    margin = num_neighbors // 2
    for x in range(margin):
        words.insert(0, default)
        words.append(default)

    ans = []
    for i, word in enumerate(words[2:-2]):
        i += margin
        neighbours = [words[i-2], words[i-1], words[i+1], words[i+2]]
        ans.append(neighbours)
    return ans


if __name__ == '__main__':
    print(sentence)
    print(get_neighbors(sentence, 4))

user@Inspiron:~/code/general$ python get_neighbors.py 
i go to school every day
[[0, 0, 'go', 'to'], [0, 'i', 'to', 'school'], ['i', 'go', 'school', 'every'], ['go', 'to', 'every', 'day'], ['to', 'school', 'day', 0], ['school', 'every', 0, 0]]
user@Inspiron:~/code/general$ 



暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM