简体   繁体   中英

R Web scraping data from StockTwits website

I want to get some information from tweets posted on the platform StockTwits. Here you can see an example tweet: https://stocktwits.com/3726859/message/469518468 I have already asked the same question once ( R How to web scrape data from StockTwits with RSelenium? ), but the StockTwits website has been changed and I can no longer work with the same html_nodes() command. I would therefore be very happy if someone could help me with the input in the html_nodes() function.
I would like to read the following information: Number of replies, number of reshares, number of likes: 在此处输入图像描述

I have got this far so far:

library(rvest)

read_html("https://stocktwits.com/SunAndStorm/message/499613811") |> 
  html_nodes()

The final result should be a dataframe, which should look like this:

# A tibble: 1 × 5
  Reply Reshare Like  Share Search
  <lgl> <lgl>   <lgl> <lgl> <lgl> 
  5     0       1     0     0  

I do not use the html nodes, but find the element with the xpath. Folowing code gives you the information you need

url <- "https://stocktwits.com/SunAndStorm/message/499613811"

# Set up driver
driver <- rsDriver(browser = "firefox", chromever = NULL)
remDr <- driver[["client"]]

# Go to site
remDr$navigate(url)

# Extract information using xpath
info <- remDr$findElement(using = "xpath", "/html/body/div[2]/div/div[2]/div[2]/div[2]/div/div/div/div[1]/div[1]/div/div[2]/article/div/div[5]")

Then you can use getelementtext to find the information

> info$getElementText()
[[1]]
[1] "4Comments\n0Reshares\n7Likes"

If you need help converting this string to a dataframe let me know and I can help you out, but I assume this is not the main problem.

Kind regerads

Look into the.network section in the developer tools and you'd find their API. Call on it with a tweet ID of interest.

I composed a start for you here. I couldn't find reshares and search. but I am sure it is there somewhere. Since you have thousand of tweets to gather info on, this method is more efficient.

library(tidyverse)
library(httr2)

get_stockwits <- function(id) {
  data <-
    str_c("https://api.stocktwits.com/api/2/messages/", id, "/conversation.json?limit=21") %>%
    request() %>%
    req_perform() %>%
    resp_body_json(simplifyVector = TRUE)

  tibble(
    tweet = data %>%
      getElement("message") %>%
      getElement("body"),
    reply = data %>%
      getElement("message") %>%
      getElement("conversation") %>%
      getElement("replies"),
    likes = data %>%
      getElement("message") %>%
      getElement("likes") %>%
      getElement("total"),
    comments = data %>%
      getElement("children") %>%
      getElement("messages") %>% 
      getElement("body")
  ) %>%
    nest(comments = comments)
}

get_stockwits(469518468)

# A tibble: 1 x 4
  tweet                             reply likes comments        
  <chr>                             <int> <int> <list>          
1 $GME going back in all this month     5     1 <tibble [2 x 1]>

Unnest comments to see the comments

get_stockwits(469518468) %>% 
  unnest(comments)

# A tibble: 2 x 4
  tweet                             reply likes comments                     
  <chr>                             <int> <int> <chr>                        
1 $GME going back in all this month     5     1 @okkenny yeah with options   
2 $GME going back in all this month     5     1 @okkenny playing monthly only

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM