简体   繁体   English

如何在 Quanteda 中将情感分析结果 (dfm) 与原始 readtext object 合并?

[英]How to merge sentiment analysis results (dfm) with original readtext object in Quanteda?

I have been using Quanteda's basic tokens_lookup function with the Young Soroka Sentiment Dictionary to count the number of positive and negative words in Tweets by politicians.我一直在使用 Quanteda 的基本tokens_lookup function 和 Young Soroka Sentiment Dictionary 来计算政客推文中正面和负面词的数量。

Once I get the results, is there a way I can then add these columns back into the original readtext object with the various docvars?一旦我得到结果,有没有办法可以将这些列添加回原始的 readtext object 和各种 docvar?

head(dat)
readtext object consisting of 6 documents and 11 docvars.
# Description: df[,13] [6 × 13]
  doc_id   text       date      username   to      replies retweets favorites geo   mentions   hashtags        id permalink               
* <chr>    <chr>      <chr>     <chr>      <chr>     <int>    <int>     <int> <lgl> <chr>      <chr>        <dbl> <chr>                   
1 trump.c… "\"Sleepy… 2020-05-… realDonal… MZHemi…    5415    13062     39680 NA    @AjitPaiF… ""       1.84e-224 https://twitter.com/rea…
2 trump.c… "\"He got… 2020-05-… realDonal… mikand…   20406    39081    111370 NA    ""         ""       1.84e-224 https://twitter.com/rea…
3 trump.c… "\"Thank … 2020-05-… realDonal… mikand…    5733    17293     66992 NA    ""         ""       1.84e-224 https://twitter.com/rea…
4 trump.c… "\".@CBS … 2020-05-… realDonal… ""        22215    25834     93625 NA    @CBS @60M… ""       1.83e-224 https://twitter.com/rea…
5 trump.c… "\"This b… 2020-05-… realDonal… GreggJ…    5379    11403     39869 NA    ""         ""       1.81e-224 https://twitter.com/rea…
6 trump.c… "\"OBAMAG… 2020-05-… realDonal… ""        55960    89664    320171 NA    ""         ""       1.81e-224 https://twitter.com/rea…
> corp <- corpus(dat)
> toks <- tokens(corp, remove_punct = TRUE)
> toks_lsd <- tokens_lookup(toks, dictionary =  data_dictionary_LSD2015[1:2])
> dfmat_lsd <- dfm(toks_lsd)
> head(dfmat_lsd)
Document-feature matrix of: 6 documents, 2 features (66.7% sparse).
6 x 2 sparse Matrix of class "dfm"
             features
docs          negative positive
  trump.csv.1        2        0
  trump.csv.2        0        0
  trump.csv.3        0        1
  trump.csv.4        2        1
  trump.csv.5        0        0
  trump.csv.6        0        0

I've tried taking the required columns from the readtext object and making a new data.frame with them, which works okay, but it'd be great if I could instead merge the dfm results back into the other data.我已经尝试从 readtext object 中获取所需的列,并使用它们创建一个新的 data.frame,这很好,但如果我可以将 dfm 结果合并回其他数据,那就太好了。

What you need to do is simply to convert the dfm to a data.frame and combine.您需要做的只是将 dfm 转换为 data.frame 并组合。

dat2 <- cbind(data, convert(dfmat_lsd, to = 'data.frame'))

Or, to make sure that the document order matches the original, you can merge the two datasets:或者,为了确保文档顺序与原始顺序匹配,您可以合并两个数据集:


library(tidyverse)
data_sentiment <- convert(dfm, to = "data.frame") %>% rename(doc_id = document)
dat2 <- left_join(dat, data_sentiment, by = "doc_id")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM