[英]text mining in R
我想從包含特定句子(兩個或多個單詞組合)而不是單個單詞的文本創建TDM。 這些句子可以是"climate change"
, "global worming"
, "lad use"
等。我所看到的例子都是單個單詞。
tabela = DocumentTermMatrix(textolimpo,
list(dictionary = c("climate change","global worming","land use")))
我感謝有人能幫助我。
干杯。
拉斐爾
我推薦quanteda
:
library(quanteda)
textolimpo <- c("This climate change concerns me. This climate changes", "Wormed: global worming increased")
(dfm <- dfm(textolimpo,
ngrams=2L,
dictionary = list(climate="climate_change",
warm="global_worming"),
valuetype = "regex"))
# 2 x 2 sparse Matrix of class "dfmSparse"
# features
# docs climate warm
# text1 2 0
# text2 0 1
(dfm <- dfm(textolimpo,
ngrams=2L,
thesaurus = list(climate="climate_change",
warm="global_worming"),
valuetype = "regex"))
# 2 x 8 sparse Matrix of class "dfmSparse"
# this_climate change_concerns concerns_me me_this wormed_global worming_increased CLIMATE WARM
# text1 2 1 1 1 0 0 2 0
# text2 0 0 0 0 1 1 0 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.