[英]How to scrape information from the website
I am trying to scrape data from the PHE website ( https://coronavirus.data.gov.uk/details/deaths ).我正在尝试从 PHE 网站 ( https://coronavirus.data.gov.uk/details/deaths ) 上抓取数据。 I am after the number of deaths within 28 days of the positive test by date reported by nation (second interactive).
我在追踪国家报告的阳性测试日期后 28 天内的死亡人数(第二次互动)。 I tried to use selector gadget to pull the data and put it into a table format but it returns zero.
我尝试使用选择器小工具提取数据并将其放入表格格式,但它返回零。 I have done this in the past and it worked fine so am not sure why it doesn't work this time.
我过去做过这个并且效果很好,所以我不确定为什么这次它不起作用。 Suspect it might be because it is a kind of interactive dashboard.
怀疑可能是因为它是一种交互式仪表板。 Any help will be appreciated.
任何帮助将不胜感激。
library(rvest)
url <- "https://coronavirus.data.gov.uk/details/deaths"
webpage <‐ read_html(url)
data <- webpage %>%
html_nodes(".dgxcKs , .govuk-table__cell--date , .govuk-table__cell--numeric , .cQSaWH") %>%
html_table()
print(data)
All the data on that page is available in the json format.该页面上的所有数据都以 json 格式提供。 You need to find the relevant json from the network tab of the browser.
您需要从浏览器的网络选项卡中找到相关的 json。
data <- jsonlite::fromJSON('https://coronavirus.data.gov.uk/api/v1/data?filters=areaType=overview&structure=%7B%22date%22:%22date%22,%22areaName%22:%22areaName%22,%22newDeaths28DaysByPublishDate%22:%22newDeaths28DaysByPublishDate%22%7D')
head(data$data)
# date areaName newDeaths28DaysByPublishDate
#1 2021-03-24 United Kingdom 98
#2 2021-03-23 United Kingdom 112
#3 2021-03-22 United Kingdom 17
#4 2021-03-21 United Kingdom 33
#5 2021-03-20 United Kingdom 96
#6 2021-03-19 United Kingdom 101
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.