[英]Creating a graph of frequency of a specific word from a dataframe over a time period in R
我在 R 中有一個推文數據框,如下所示:
tweet_text tweet_time rdate twt
<chr> <dttm> <date> <dbl>
1 No New England cottage is complete without nautical t.. 2016-08-25 09:21:00 2016-08-25 1
2 Justice Scalia spent his last hours with members of co… 2016-11-24 16:28:00 2016-11-24 1
3 WHAT THE FAILED OKLAHOMA ABORTION BILL TELLS US http:/… 2016-11-24 16:27:00 2016-11-24 1
4 Bipartisan bill in US Senate to restrict US arms sales… 2016-10-26 07:03:00 2016-10-26 1
5 #MustResign campaign is underway with the heat p his S… 2016-10-01 08:15:00 2016-10-01 1
每條推文都有一個特定的日期,數據框中的所有推文都來自一年的時間段。 我想找出一個特定單詞(例如“參議院”)在整個時期內的頻率,並繪制一張圖表來捕捉頻率隨時間的變化情況。 我是 R 的新手,我只能想到超級復雜的方法,但我相信一定有一些非常簡單的方法。
我感謝任何建議。
textFreq <- function(pattern, text){
freq <- gregexpr(pattern = pattern, text = text, ignore.case = TRUE)
freq <- lapply(freq, FUN = function(x){
if(length(x)==1&&x==-1){
return(0)
} else {
return(length(x))
}
})
freq <- unlist(freq)
return(freq)
}
test.text <- c("senate.... SENate.. sen","Working in the senate...", "I like dogs")
textFreq(pattern = "senate", test.text)
# [1] 2 1 0
您可以使用dplyr
按時間段分組並使用 mutate
library(dplyr)
library(magrittr)
data <- data %>%
group_by(*somedatefactor*) %>% #if you wanted to aggrigate every 10 days or something
mutate(SenateFreqPerTweet = textFreq(pattern = "Senate", text = tweet_text),
SenateFreqTotal = sum(SenateFreqPerTweet)) #Counts sum based on current grouping
您甚至可以將前面的語句包裝到另一個函數中。 為此,請查看使用 dplyr 進行編程
但無論如何,使用這種方法,您可以使用ggplot2
包輕松繪制SenateFreqTotal
data2 <- data %>% #may be helpful to reduce the size of the dataframe before plotting.
select(SenateFreqTotal, *somedatefactor*) %>%
distinct()
ggplot(data2, aes(y=SenateFreqTotal, x = *somedatefactor*)+ geom_bar(stat="identity")
如果你不想聚合頻率,你可以像這樣繪制
ggplot(data, aes(y=SenateFreqPerTweet, x = tweet_time)) +
geom_bar(stat = "identity")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.