简体   繁体   English

使用 rvest 刮 ASX

[英]Using rvest to scrape ASX

I'm trying to scrape data from the ASX (Australian Stock Exchange) site.我正在尝试从 ASX(澳大利亚证券交易所)网站上抓取数据。 For example, on BHP on ASX , at the bottom of the page is a collection of fundamentals data.例如,在 ASX 的 BHP 上,页面底部是基本数据的集合。 The selector for the values, eg eps, is:值的选择器,例如 eps,是:

#company_key_statistics > div > div.panel-body.row > div:nth-child(3) > table > tbody > tr:nth-child(8) > td

I tried我试过了

library(rvest)
ASX_bhp <-read_html("https://www2.asx.com.au/markets/company/bhp")
ASX_data <- ASX_bhp |> html_elements("td") |> html_text()

or instead of "td", I have tried "tr", "#company_key_statistics", or the whole selector string.或者代替“td”,我尝试了“tr”、“#company_key_statistics”或整个选择器字符串。 However, all return an empty character.但是,都返回一个空字符。 I also tried html_nodes instead of html_elements .我还尝试html_nodes而不是html_elements

How should I extract fundamental data from this site?我应该如何从这个网站提取基本数据?

All that data is fetched and presented through JavaScript, thus it's not available for rvest (at least not through that URL).所有这些数据都是通过 JavaScript 获取和呈现的,因此不能用于 rvest(至少不能通过该 URL)。 But you can user their API:但是您可以使用他们的 API:

library(jsonlite)
bhp <- fromJSON("https://asx.api.markitdigital.com/asx-research/1.0/companies/bhp/key-statistics")
bhp$data$earningsPerShare
#> [1] 5.95708

Created on 2022-09-19 with reprex v2.0.2使用reprex v2.0.2创建于 2022-09-19

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM