[英]scraping historical data using drop-down list
I try to scrape data from https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process .我尝试从https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process中抓取数据。 As you see, there is a drop-down menu to call historical data.
如您所见,有一个下拉菜单可以调用历史数据。 However, the link is not associated with the date range.
但是,该链接与日期范围无关。 Hence, I am not able to create a rvest loop going to relevant date and scraping data.
因此,我无法创建一个 rvest 循环去相关日期和抓取数据。 How can I get the historical drug approval data from this page under these circumstances?
在这种情况下,如何从该页面获取历史药物批准数据?
The url, eg, for July 2019 appears to be " https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptName=0& reportSelectMonth =7& reportSelectYear =2019" so you could create a loop for months and years, apply them to the reportSelectMonth and reportSelectYear portions of the url, and invoke read_html() on each of the dynamically created url?例如,2019 年 7 月的 url 似乎是“ https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptSelectName=0&report1SelectMonth = 7 ”可以创建几个月和几年的循环,将它们应用于 url 的 reportSelectMonth 和 reportSelectYear 部分,并在每个动态创建的 url 上调用 read_html()?
If you want all of 2017 and 2018, for example, you could do:例如,如果您想要所有 2017 年和 2018 年,您可以这样做:
library(rvest)
baseUrl <- "https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptName=0&"
for (year in 2017:2018) {
for (month in 1:12) {
url <- paste0(baseUrl, "&reportSelectMonth=", month, "&reportSelectYear=", year)
p <- read_html(url)
# do stuff
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.