使用“rvest”抓取html表

Question

I try to using rvest package to scrape a table: 我尝试使用rvest包刮一张桌子：

library(rvest)

x <- read_html ("http://www.jcb.jp/rate/usd04182016.html")
x %>% html_node(".CSVTable") %>% html_table

Url elements look likes: 网址元素看起来像：

<table class="CSVTable">
 <tbody>...</tbody>
 <tbody class>...</tbody>
</table>

Why I occur the error "No matches"? 为什么我出现错误“不匹配”？

Answer 1

You're in luck (kind of). 你很幸运（有点）。 The site uses dynamic XHR requests to make that table, but said request is also a CSV file. 该站点使用动态XHR请求来生成该表，但所述请求也是CSV文件。

library(rvest)
library(stringr)

pg <- read_html("http://www.jcb.jp/rate/usd04182016.html")

# the <script> tag that does the dynamic loading is in position 6 of the 
# list of <script> tags

fil <- str_match(html_text(html_nodes(pg, "script")[6]), "(/uploads/[[:digit:]]+\\.csv)")[,2]

df <- read.csv(sprintf("http://www.jcb.jp%s", fil), header=FALSE, stringsAsFactors=FALSE)

df <- setNames(df[,3:6], c("buy", "mid", "sell", "symbol"))

head(df)
##        buy      mid     sell symbol
## 1   3.6735   3.6736   3.6737    AED
## 2  68.2700  69.0700  69.8700    AFN
## 3 122.3300 122.6300 122.9300    ALL
## 4 479.5000 481.0000 482.5000    AMD
## 5   1.7710   1.8110   1.8510    ANG
## 6 165.0600 165.3100 165.5600    AOA

But, that also means you can just get the CSV directly: 但是，这也意味着您可以直接获取CSV：

read.csv("http://www.jcb.jp/uploads/20160418.csv")

(just format the date properly in your requests). （只需在请求中正确格式化日期）。

使用“rvest”抓取html表

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-04-19 11:39:52

使用“rvest”抓取html表

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-04-19 11:39:52

解决方案1
4 已采纳 2016-04-19 11:39:52