如何在 R 中找到单词和单词列表之间的语义相似度？

Question

I was attempting to find the semantic similarity between a single word and each word in a list that I have generated online using the English lexicon project.我试图在我使用英语词典项目在线生成的列表中找到单个单词和每个单词之间的语义相似性。 Here is the list of words that I was using.这是我使用的单词列表。 I converted to string to use the word2vec() function to train upon it later on.我转换为字符串以使用 word2vec() function 稍后对其进行训练。 There are 20 words in this list.此列表中有 20 个单词。

library(word2vec)
library(readr)
library(NLP)

word.list<- read.delim("wordlist.txt", header = FALSE) 
str(word.list)
'data.frame':   1 obs. of  1 variable:
 $ V1: chr "postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino "| __truncated__

    word.list <- structure(list(V1 = "postage", V2 = "enzyme", V3 = "textbook", V4 = "requirement", V5 = "installation", V6 = "eyeglasses", V7 = "numerical", V8 = "priesthood", V9 = "fence", V10 = "assemble", V11 = "extract", V12 = "domino", V13 = "square", V14 = "deduction", V15 = "predecessor", V16 = "liaison", V17 = "launder", V18 = "canteen", V19 = "cashier", V20 = "informal"), class = "data.frame", row.names = c(NA, -1L))

word.list.ch<- as.String(word.list)
 chr "postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino "| __truncated__

Following an example online ( http://www.bnosac.be/index.php/blog/100-word2vec-in-r ), I have tried to train these words into a model.按照在线示例（ http://www.bnosac.be/index.php/blog/100-word2vec-in-r ），我尝试将这些单词训练成 model。

model<- word2vec(word.list.ch, type = "cbow", dim = 20, iter = 20)

However, it says that the file or directory did not exist.但是，它说文件或目录不存在。

Training failed: fileMapper: postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino square deduction predecessor liaison launder canteen cashier informal - No such file or directory

Could I ask why is this the case, and if there is an alternative solution?请问为什么会这样，是否有替代解决方案？

Many thanks in advance!提前谢谢了！

data (pasted from comment)数据（从评论中粘贴）

structure(list(V1 = "postage", V2 = "enzyme", V3 = "textbook", V4 = "requirement", V5 = "installation", V6 = "eyeglasses", V7 = "numerical", V8 = "priesthood", V9 = "fence", V10 = "assemble", V11 = "extract", V12 = "domino", V13 = "square", V14 = "deduction", V15 = "predecessor", V16 = "liaison", V17 = "launder", V18 = "canteen", V19 = "cashier", V20 = "informal"), class = "data.frame", row.names = c(NA, -1L))

Answer 1

You need to train your word2vec model on a character vector of length > 1.您需要在长度 > 1 的字符向量上训练 word2vec model。

Training on just 20 words does not make sense, make sure you have a large corpus to train your word2vec model upon.只训练 20 个单词是没有意义的，确保你有一个大的语料库来训练你的 word2vec model。

如何在 R 中找到单词和单词列表之间的语义相似度？

问题描述

1 个解决方案

解决方案1
0 2021-12-16 09:30:54

如何在 R 中找到单词和单词列表之间的语义相似度？

问题描述

1 个解决方案

解决方案1 0 2021-12-16 09:30:54

解决方案1
0 2021-12-16 09:30:54