简体   繁体   English

平均来自文档的向量

[英]Averaging Vectors from Documents

How could I use the medium-sized spaCy model en_core_web_md to parse through a folder of documents to get individual vectors from each single word document, and then average them together? 如何使用中等大小的spaCy模型en_core_web_md解析文档文件夹,以从每个单词文档中获取单个矢量,然后将它们平均在一起?

import spacy
nlp = spacy.load("en_core_web_md")

First you have to load all the docs into the list using python file io/op. 首先,您必须使用python文件io / op将所有文档加载到列表中。

#documents loaded into the python list.
documents_list = ['Hello, world','Here are two sentences.']
#Iterate over each document and initiate nlp instance.
for doc in documents_list:
    doc_nlp = nlp(doc)
    #this gives the average vector of each document.
    print(doc_nlp.vector)
    for token in doc_nlp:
        #this gives the text of each word in the doc and their vector.
        print(token.text,token.vector)

Let me know if you need any clarification. 让我知道您是否需要任何澄清。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM