在 R 中的一段時間內從數據幀創建特定單詞的頻率圖

Question

我在 R 中有一個推文數據框，如下所示：

  tweet_text                                              tweet_time          rdate        twt
  <chr>                                                   <dttm>              <date>     <dbl>
1 No New England cottage is complete without nautical t.. 2016-08-25 09:21:00 2016-08-25     1
2 Justice Scalia spent his last hours with members of co… 2016-11-24 16:28:00 2016-11-24     1
3 WHAT THE FAILED OKLAHOMA ABORTION BILL TELLS US http:/… 2016-11-24 16:27:00 2016-11-24     1
4 Bipartisan bill in US Senate to restrict US arms sales… 2016-10-26 07:03:00 2016-10-26     1
5 #MustResign campaign is underway with the heat p his S… 2016-10-01 08:15:00 2016-10-01     1

每條推文都有一個特定的日期，數據框中的所有推文都來自一年的時間段。 我想找出一個特定單詞（例如“參議院”）在整個時期內的頻率，並繪制一張圖表來捕捉頻率隨時間的變化情況。 我是 R 的新手，我只能想到超級復雜的方法，但我相信一定有一些非常簡單的方法。

我感謝任何建議。

Answer 1

textFreq <- function(pattern, text){
    freq <- gregexpr(pattern = pattern, text = text, ignore.case = TRUE)
    freq <- lapply(freq, FUN = function(x){
            if(length(x)==1&&x==-1){
                return(0)
            } else {
                return(length(x))
            }
        })
    freq <- unlist(freq)
    return(freq)
}

test.text <- c("senate.... SENate.. sen","Working in the senate...", "I like dogs")
textFreq(pattern = "senate", test.text)
# [1] 2 1 0

您可以使用dplyr按時間段分組並使用 mutate

library(dplyr)
library(magrittr)
data <- data %>% 
    group_by(*somedatefactor*) %>% #if you wanted to aggrigate every 10 days or something
    mutate(SenateFreqPerTweet = textFreq(pattern = "Senate", text = tweet_text),
           SenateFreqTotal = sum(SenateFreqPerTweet)) #Counts sum based on current grouping

您甚至可以將前面的語句包裝到另一個函數中。 為此，請查看使用 dplyr 進行編程

但無論如何，使用這種方法，您可以使用ggplot2包輕松繪制SenateFreqTotal

 data2 <- data %>% #may be helpful to reduce the size of the dataframe before plotting.
     select(SenateFreqTotal, *somedatefactor*) %>% 
     distinct()
 ggplot(data2, aes(y=SenateFreqTotal, x = *somedatefactor*)+ geom_bar(stat="identity")

如果你不想聚合頻率，你可以像這樣繪制

ggplot(data, aes(y=SenateFreqPerTweet, x = tweet_time)) + 
    geom_bar(stat = "identity")

在 R 中的一段時間內從數據幀創建特定單詞的頻率圖

問題描述

1 個解決方案

解決方案1
1 2020-01-22 16:52:29

在 R 中的一段時間內從數據幀創建特定單詞的頻率圖

問題描述

1 個解決方案

解決方案1 1 2020-01-22 16:52:29

解決方案1
1 2020-01-22 16:52:29