簡體   English   中英

如何在 R 中找到單詞和單詞列表之間的語義相似度?

[英]How to find the semantic similarity between a word and a list of words in R?

我試圖在我使用英語詞典項目在線生成的列表中找到單個單詞和每個單詞之間的語義相似性。 這是我使用的單詞列表。 我轉換為字符串以使用 word2vec() function 稍后對其進行訓練。 此列表中有 20 個單詞。

library(word2vec)
library(readr)
library(NLP)

word.list<- read.delim("wordlist.txt", header = FALSE) 
str(word.list)
'data.frame':   1 obs. of  1 variable:
 $ V1: chr "postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino "| __truncated__

    word.list <- structure(list(V1 = "postage", V2 = "enzyme", V3 = "textbook", V4 = "requirement", V5 = "installation", V6 = "eyeglasses", V7 = "numerical", V8 = "priesthood", V9 = "fence", V10 = "assemble", V11 = "extract", V12 = "domino", V13 = "square", V14 = "deduction", V15 = "predecessor", V16 = "liaison", V17 = "launder", V18 = "canteen", V19 = "cashier", V20 = "informal"), class = "data.frame", row.names = c(NA, -1L))

word.list.ch<- as.String(word.list)
 chr "postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino "| __truncated__

按照在線示例( http://www.bnosac.be/index.php/blog/100-word2vec-in-r ),我嘗試將這些單詞訓練成 model。

model<- word2vec(word.list.ch, type = "cbow", dim = 20, iter = 20)

但是,它說文件或目錄不存在。

Training failed: fileMapper: postage enzyme textbook requirement installation eyeglasses numerical priesthood fence assemble extract domino square deduction predecessor liaison launder canteen cashier informal - No such file or directory

請問為什么會這樣,是否有替代解決方案?

提前謝謝了!

數據(從評論中粘貼)

structure(list(V1 = "postage", V2 = "enzyme", V3 = "textbook", V4 = "requirement", V5 = "installation", V6 = "eyeglasses", V7 = "numerical", V8 = "priesthood", V9 = "fence", V10 = "assemble", V11 = "extract", V12 = "domino", V13 = "square", V14 = "deduction", V15 = "predecessor", V16 = "liaison", V17 = "launder", V18 = "canteen", V19 = "cashier", V20 = "informal"), class = "data.frame", row.names = c(NA, -1L))

您需要在長度 > 1 的字符向量上訓練 word2vec model。

只訓練 20 個單詞是沒有意義的,確保你有一個大的語料庫來訓練你的 word2vec model。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM