使用 R 進行網絡抓取（我想從網站中提取一些類似數據的表格）

Question

我在從網站上抓取數據時遇到了一些問題。 我在網絡抓取方面沒有太多經驗。 我的計划是使用 R 從以下網站抓取一些數據： https ://www.myfxbook.com/forex-broker-swaps

更准確地說，我想提取所有可用貨幣對的外匯經紀商掉期比較。

到目前為止我的想法：

  library(XML)
  url <- paste0("https://www.myfxbook.com/forex-broker-swaps")
  source <- readLines(url, encoding = "UTF-8")
  parsed_doc <- htmlParse(source, encoding = "UTF-8")
  test<-xpathSApply(parsed_doc, path = '/html/body/div[3]/div[6]/div/div/div/div/div/div/div/div/div[3]/div[4]/div/div[2]/div', xmlValue)

但這並沒有帶來預期的信息。 一些幫助將在這里非常感激！ 謝謝！

Answer 1

這個怎么樣：

library(dplyr)
library(rvest)
h <- read_html("https://www.myfxbook.com/forex-broker-swaps")
h %>% html_table() %>% 
  purrr::pluck(3) %>% 
  setNames(paste(names(.), .[1,], sep="_")) %>% 
  rename("Broker" = "_Broker") %>% 
  filter(Broker != "Broker") %>%
  mutate(across(-Broker, as.numeric))
#> # A tibble: 91 × 13
#>    Broker          `EUR/USD_Short` `EUR/USD_Long` `EUR/USD_Type` `GBP/USD_Short`
#>    <chr>                     <dbl>          <dbl>          <dbl>           <dbl>
#>  1 Axi                        0.17          -0.56              0           -0.18
#>  2 Tickmill                   0.24          -0.55              0           -0.22
#>  3 Blueberry Mark…            0.31          -0.55              0           -0.17
#>  4 Eightcap                   0.31          -0.55              0           -0.17
#>  5 Rakuten Securi…            0.19          -0.5               0            0   
#>  6 ACY Securities            -0.34          -3.75              3           -1.28
#>  7 AAAFx                      1.98          -6.42              1           -2.07
#>  8 MultiBank Group            0.3           -0.66              0           -0.12
#>  9 Just2Trade                 0.12          -0.9               0           -0.26
#> 10 Fusion Markets             0.31          -0.55              0           -0.15
#> # … with 81 more rows, and 8 more variables: `GBP/USD_Long` <dbl>,
#> #   `GBP/USD_Type` <dbl>, `USD/CAD_Short` <dbl>, `USD/CAD_Long` <dbl>,
#> #   `USD/CAD_Type` <dbl>, `USD/JPY_Short` <dbl>, `USD/JPY_Long` <dbl>,
#> #   `USD/JPY_Type` <dbl>

^{由reprex 包於 2022-05-25 創建 (v2.0.1)}

使用 R 進行網絡抓取（我想從網站中提取一些類似數據的表格）

問題描述

1 個解決方案

解決方案1
1 已采納 2022-05-25 10:28:08

使用 R 進行網絡抓取（我想從網站中提取一些類似數據的表格）

問題描述

1 個解決方案

解決方案1 1 已采納 2022-05-25 10:28:08

解決方案1
1 已采納 2022-05-25 10:28:08