![](/img/trans.png)
[英]R Text mining - how to change texts in R data frame column into several columns with word frequencies?
[英]R Text mining - how to change texts in R data frame column into several columns with bigram frequencies?
除了问题R文本挖掘-如何将R数据帧列中的文本更改为具有单词频率的几列? 我想知道如何管理以双字组频率而不是仅单词频率构成的列。 再次感谢您!
这是示例数据帧(感谢Tyler Rinker)。
person sex adult state code
1 sam m 0 Computer is fun. Not too fun. K1
2 greg m 0 No it's not, it's dumb. K2
3 teacher m 1 What should we do? K3
4 sam m 0 You liar, it stinks! K4
5 greg m 0 I am telling the truth! K5
6 sally f 0 How can we be certain? K6
7 greg m 0 There is no way. K7
8 sam m 0 I distrust you. K8
9 sally f 0 What are you talking about? K9
10 researcher f 1 Shall we move on? Good then. K10
11 greg m 0 I'm hungry. Let's eat. You already? K11
上面的数据集:
library(qdap); DATA
qdap
的开发版本(应在接下来的几天内转到CRAN)执行ngram。 现在,您需要使用dev版本 。 在玩具数据集上,这是快速的,但在较大的数据集上,例如qdap
的mraja1
数据集,则需要大约5分钟才能完成。 你可以:
这是获取qdap
开发版本并运行bigram搜索的代码:
library(devtools)
install_github("qdap", "trinker")
library(qdap)
## this gets the bigrams
bigrams <- sapply(ngrams(DATA$state)[[c("all_n", "n_2")]], paste, collapse=" ")
## This searches by grouping variable for bigram use
termco(DATA$state, DATA$person, bigrams)
## To get raw values
termco(DATA$state, DATA$person, bigrams)[["raw"]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.