简体   繁体   English

使用下拉列表抓取历史数据

[英]scraping historical data using drop-down list

I try to scrape data from https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process .我尝试从https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process中抓取数据。 As you see, there is a drop-down menu to call historical data.如您所见,有一个下拉菜单可以调用历史数据。 However, the link is not associated with the date range.但是,该链接与日期范围无关。 Hence, I am not able to create a rvest loop going to relevant date and scraping data.因此,我无法创建一个 rvest 循环去相关日期和抓取数据。 How can I get the historical drug approval data from this page under these circumstances?在这种情况下,如何从该页面获取历史药物批准数据?

The url, eg, for July 2019 appears to be " https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptName=0& reportSelectMonth =7& reportSelectYear =2019" so you could create a loop for months and years, apply them to the reportSelectMonth and reportSelectYear portions of the url, and invoke read_html() on each of the dynamically created url?例如,2019 年 7 月的 url 似乎是“ https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptSelectName=0&report1SelectMonth = 7 ”可以创建几个月和几年的循环,将它们应用于 url 的 reportSelectMonth 和 reportSelectYear 部分,并在每个动态创建的 url 上调用 read_html()?

If you want all of 2017 and 2018, for example, you could do:例如,如果您想要所有 2017 年和 2018 年,您可以这样做:

library(rvest)
baseUrl <- "https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptName=0&"
for (year in 2017:2018) {
    for (month in 1:12) {
        url <- paste0(baseUrl, "&reportSelectMonth=", month, "&reportSelectYear=", year)
        p <- read_html(url)
        # do stuff
    } 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM