简体   繁体   English

如何在 R 中找到单词和单词列表之间的语义相似度?

[英]How to find the semantic similarity between a word and a list of words in R?

I was attempting to find the semantic similarity between a single word and each word in a list that I have generated online using the English lexicon project.我试图在我使用英语词典项目在线生成的列表中找到单个单词和每个单词之间的语义相似性。 Here is the list of words that I was using.这是我使用的单词列表。 I converted to string to use the word2vec() function to train upon it later on.我转换为字符串以使用 word2vec() function 稍后对其进行训练。 There are 20 words in this list.此列表中有 20 个单词。

library(word2vec)
library(readr)
library(NLP)

word.list<- read.delim("wordlist.txt", header = FALSE) 
str(word.list)
'data.frame':   1 obs. of  1 variable:
 $ V1: chr "postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino "| __truncated__

    word.list <- structure(list(V1 = "postage", V2 = "enzyme", V3 = "textbook", V4 = "requirement", V5 = "installation", V6 = "eyeglasses", V7 = "numerical", V8 = "priesthood", V9 = "fence", V10 = "assemble", V11 = "extract", V12 = "domino", V13 = "square", V14 = "deduction", V15 = "predecessor", V16 = "liaison", V17 = "launder", V18 = "canteen", V19 = "cashier", V20 = "informal"), class = "data.frame", row.names = c(NA, -1L))

word.list.ch<- as.String(word.list)
 chr "postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino "| __truncated__

Following an example online ( http://www.bnosac.be/index.php/blog/100-word2vec-in-r ), I have tried to train these words into a model.按照在线示例( http://www.bnosac.be/index.php/blog/100-word2vec-in-r ),我尝试将这些单词训练成 model。

model<- word2vec(word.list.ch, type = "cbow", dim = 20, iter = 20)

However, it says that the file or directory did not exist.但是,它说文件或目录不存在。

Training failed: fileMapper: postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino square deduction predecessor liaison launder canteen cashier informal - No such file or directory

Could I ask why is this the case, and if there is an alternative solution?请问为什么会这样,是否有替代解决方案?

Many thanks in advance!提前谢谢了!

data (pasted from comment)数据(从评论中粘贴)

structure(list(V1 = "postage", V2 = "enzyme", V3 = "textbook", V4 = "requirement", V5 = "installation", V6 = "eyeglasses", V7 = "numerical", V8 = "priesthood", V9 = "fence", V10 = "assemble", V11 = "extract", V12 = "domino", V13 = "square", V14 = "deduction", V15 = "predecessor", V16 = "liaison", V17 = "launder", V18 = "canteen", V19 = "cashier", V20 = "informal"), class = "data.frame", row.names = c(NA, -1L))

You need to train your word2vec model on a character vector of length > 1.您需要在长度 > 1 的字符向量上训练 word2vec model。

Training on just 20 words does not make sense, make sure you have a large corpus to train your word2vec model upon.只训练 20 个单词是没有意义的,确保你有一个大的语料库来训练你的 word2vec model。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 R 中找到相似度? - How to find the similarity in R? 如何在R中的表格中的单元格之间找到%匹配/相似度? - How to find the % match/similarity between cells in a table in R? 在 r/python 中查找 id 列之间的相似性 - Find similarity between column of an id in r/python 如何查找 R 中列表中包含和不包含的单词 - How to find words contained and not contained in a list in R R(stringsim)中两个列表之间的字符串相似度 - String similarity between two list in R (stringsim) 给定R中的单词列表,如何查找唯一单词 - How to find unique words given a list of words in R 使用通配符过滤掉带有 R 的语义标签之间的词 - Using wildcards to filter out words between semantic tags with R 用 R 中的一个唯一单词替换单词列表 - Replace a list of words with one unique word in R 如何在给定的单词列表中查找包含单词的行? 不仅是某个单词,该列表中的任何单词都很重要 - How to find rows which contain words in a given list of words? Not only a certain word, any word in that certain list counts R-操作方法:对于特定列表中的每个单词,计算该单词在例如3000个单词的列中出现的频率 - R - how to: for every word in a certain list , count how often the word occurs in a column of eg 3000 words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM