R中文本的时间序列分析

Question

如果我有这样的数据：

df = data.frame(person = c('jim','john','pam','jim'),
                date =c('2018-01-01','2018-02-01','2018-03-01','2018-04-01'),
                text = c('the lonely engineer','tax season is upon us, engineers, do your taxes!','i am so lonely','rage coding is the best')                  )

我想按日期了解趋势术语，该如何处理？

  xCorp = corpus(df, text_field = 'text')
    x = tokens(xCorp) %>% tokens_remove(
      c(
        stopwords('english'),
        'western digital',
        'wd',
        'nil'),
      padding = T
    ) %>%
      dfm(
        remove_numbers = TRUE,
        remove_punct = TRUE,

        remove_symbols = T,
        concatenator = ' '
      )
  x2 = dfm(x, groups = 'date')

这会让我参与其中，但是不确定这是否是最好的方法。

Answer 1

使用tidyverse，我能够执行以下操作：

 df = df %>% 
        group_by(date) %>%
        unnest_tokens(word,text) %>%
        count(word,sort = T) %>%  
    }

R中文本的时间序列分析

问题描述

1 个解决方案

解决方案1
0 2018-04-24 18:24:17

R中文本的时间序列分析

问题描述

1 个解决方案

解决方案1 0 2018-04-24 18:24:17

解决方案1
0 2018-04-24 18:24:17