有沒有辦法在一段時間內在 Twitter 時間線中搜索特定單詞？

Question

我正在 R 中進行一些推特分析，現在我正在從一個帳戶中獲取推文，但我想在這些推文中搜索 1 個或多個特定單詞，然后繪制一個圖，顯示該單詞在一段時間內重復了多少時間，例如 1 周。

這是我的腳本：

#instalar librerias
library(wordcloud)
library(SnowballC)
library(rtweet)
library(tm)
library(RColorBrewer)
library(tidytext)
library(dplyr)
library(wordcloud2)
library(stringr)
library(qdapRegex)

# Identificacion y obtencion de tokens
appname <- "AnalisisTwitSent"
consumer_key     <- "CvgpjfxMIyUmg21HFPSKoFKr4"
consumer_secret  <- "5VO0fWH6QK5jyYWx4PtABHyhvvZ5JyVjDNjQ2F36mDjYibu5g7"
access_token <- "2820319925-CTKOd9yiA8MmJlak1iXUDCbg2MKkKDlffjr9LyV"
access_secret <- "ZiZBJIjxqY9lNLemYdGxMD6BYM6eY43NyLGhRS4NRKu5S"

twitter_token <- create_token(app = appname, 
                           consumer_key = consumer_key, 
                           consumer_secret = consumer_secret,
                           access_token = access_token, 
                           access_secret = access_secret,
                           set_renv = TRUE)

ver_palabras_comunes_nube <- function(busqueda, cantidad) {

  #Obtener tweets
  #tweets <- get_timeline(usuario, n = cantidad, 
                     #parse = TRUE, check = TRUE,
                     #include_rts = TRUE)
  tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)

  text <- str_c(tweets$text, collapse = "")

  # continue cleaning the text
  text <- 
    text %>%
    str_remove("\\n") %>%                   # remove linebreaks
    rm_twitter_url() %>%                    # Remove URLS
    rm_url() %>%
    str_remove_all("#\\S+") %>%             # Remove any hashtags
    str_remove_all("@\\S+") %>%             # Remove any @ mentions
    removeWords(stopwords("spanish")) %>%   # Remove common words (a, the, it etc.)
    removeNumbers() %>%
    stripWhitespace() %>%
    removeWords(c("amp"))                   # Final cleanup of other small changes
    gsub("\\p{So}|\\p{Cn}", "", text, perl = TRUE)


  rm_emoticon(text, replacement = "")

  # Convert the data into a summary table
  textCorpus <- 
    Corpus(VectorSource(text)) %>%
    TermDocumentMatrix() %>%
    as.matrix()

  textCorpus <- sort(rowSums(textCorpus), decreasing=TRUE)
  textCorpus <- data.frame(word = names(textCorpus), freq=textCorpus, row.names = NULL)

  wordcloud <- wordcloud2(data = textCorpus, minRotation = 0, maxRotation = 0)
  wordcloud
}

Answer 1

要獲得特定單詞隨時間變化的頻率圖，您只需要計算它們在每個時間段中出現的頻率，然后繪制它們。 我在這里使用tidytext包，它非常適合於此。 但是您也可以考慮僅使用stringr::str_count() （盡管在這種情況下要注意或糾正標記化）。 你把你的代碼放在一個函數中，在這種情況下這不是必需的，但我編寫了代碼，所以你可以根據需要快速將它放回函數中。

library(rtweet)
library(tidyverse)
library(tidytext)

# define variables
busqueda <- "Poppycock"   
cantidad <- 100
pattern <- c("is", "to")

# query tweets
tweets <- search_tweets(busqueda, cantidad, include_rts = FALSE)


# count the occurence of the pattern words
pattern_df <- tweets %>% 
  select(status_id, text, created_at) %>%          # only keep data columns we need later
  unnest_tokens(word, text) %>%                    # split the text into tokens (words)
  filter(word %in% pattern) %>%                    # only keept words defined in pattern
  mutate(hour = lubridate::hour(created_at)) %>%   # extract the hour from the created_at time, use week here if you want
  count(word, hour)                                # count the words per hour

# plot
ggplot(pattern_df, aes(x = hour, y = n, fill = word)) +
  geom_col(position = "dodge")

有沒有辦法在一段時間內在 Twitter 時間線中搜索特定單詞？

問題描述

1 個解決方案

解決方案1
4 已采納 2020-02-03 15:50:30

有沒有辦法在一段時間內在 Twitter 時間線中搜索特定單詞？

問題描述

1 個解決方案

解決方案1 4 已采納 2020-02-03 15:50:30

解決方案1
4 已采納 2020-02-03 15:50:30