![](/img/trans.png)
[英]Creating a graph of frequency of a specific word from a dataframe over a time period in R
[英]is there a way to search specific word in a twitter timeline over a period of time?
我正在 R 中進行一些推特分析,現在我正在從一個帳戶中獲取推文,但我想在這些推文中搜索 1 個或多個特定單詞,然后繪制一個圖,顯示該單詞在一段時間內重復了多少時間,例如 1 周。
這是我的腳本:
#instalar librerias
library(wordcloud)
library(SnowballC)
library(rtweet)
library(tm)
library(RColorBrewer)
library(tidytext)
library(dplyr)
library(wordcloud2)
library(stringr)
library(qdapRegex)
# Identificacion y obtencion de tokens
appname <- "AnalisisTwitSent"
consumer_key <- "CvgpjfxMIyUmg21HFPSKoFKr4"
consumer_secret <- "5VO0fWH6QK5jyYWx4PtABHyhvvZ5JyVjDNjQ2F36mDjYibu5g7"
access_token <- "2820319925-CTKOd9yiA8MmJlak1iXUDCbg2MKkKDlffjr9LyV"
access_secret <- "ZiZBJIjxqY9lNLemYdGxMD6BYM6eY43NyLGhRS4NRKu5S"
twitter_token <- create_token(app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret,
set_renv = TRUE)
ver_palabras_comunes_nube <- function(busqueda, cantidad) {
#Obtener tweets
#tweets <- get_timeline(usuario, n = cantidad,
#parse = TRUE, check = TRUE,
#include_rts = TRUE)
tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)
text <- str_c(tweets$text, collapse = "")
# continue cleaning the text
text <-
text %>%
str_remove("\\n") %>% # remove linebreaks
rm_twitter_url() %>% # Remove URLS
rm_url() %>%
str_remove_all("#\\S+") %>% # Remove any hashtags
str_remove_all("@\\S+") %>% # Remove any @ mentions
removeWords(stopwords("spanish")) %>% # Remove common words (a, the, it etc.)
removeNumbers() %>%
stripWhitespace() %>%
removeWords(c("amp")) # Final cleanup of other small changes
gsub("\\p{So}|\\p{Cn}", "", text, perl = TRUE)
rm_emoticon(text, replacement = "")
# Convert the data into a summary table
textCorpus <-
Corpus(VectorSource(text)) %>%
TermDocumentMatrix() %>%
as.matrix()
textCorpus <- sort(rowSums(textCorpus), decreasing=TRUE)
textCorpus <- data.frame(word = names(textCorpus), freq=textCorpus, row.names = NULL)
wordcloud <- wordcloud2(data = textCorpus, minRotation = 0, maxRotation = 0)
wordcloud
}
要獲得特定單詞隨時間變化的頻率圖,您只需要計算它們在每個時間段中出現的頻率,然后繪制它們。 我在這里使用tidytext
包,它非常適合於此。 但是您也可以考慮僅使用stringr::str_count()
(盡管在這種情況下要注意或糾正標記化)。 你把你的代碼放在一個函數中,在這種情況下這不是必需的,但我編寫了代碼,所以你可以根據需要快速將它放回函數中。
library(rtweet)
library(tidyverse)
library(tidytext)
# define variables
busqueda <- "Poppycock"
cantidad <- 100
pattern <- c("is", "to")
# query tweets
tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)
# count the occurence of the pattern words
pattern_df <- tweets %>%
select(status_id, text, created_at) %>% # only keep data columns we need later
unnest_tokens(word, text) %>% # split the text into tokens (words)
filter(word %in% pattern) %>% # only keept words defined in pattern
mutate(hour = lubridate::hour(created_at)) %>% # extract the hour from the created_at time, use week here if you want
count(word, hour) # count the words per hour
# plot
ggplot(pattern_df, aes(x = hour, y = n, fill = word)) +
geom_col(position = "dodge")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.