使用下拉列表抓取历史数据

Question

I try to scrape data from https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process .我尝试从https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process中抓取数据。 As you see, there is a drop-down menu to call historical data.如您所见，有一个下拉菜单可以调用历史数据。 However, the link is not associated with the date range.但是，该链接与日期范围无关。 Hence, I am not able to create a rvest loop going to relevant date and scraping data.因此，我无法创建一个 rvest 循环去相关日期和抓取数据。 How can I get the historical drug approval data from this page under these circumstances?在这种情况下，如何从该页面获取历史药物批准数据？

Answer 1

The url, eg, for July 2019 appears to be " https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptName=0& reportSelectMonth =7& reportSelectYear =2019" so you could create a loop for months and years, apply them to the reportSelectMonth and reportSelectYear portions of the url, and invoke read_html() on each of the dynamically created url?例如，2019 年 7 月的 url 似乎是“ https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptSelectName=0&report1SelectMonth = 7 ”可以创建几个月和几年的循环，将它们应用于 url 的 reportSelectMonth 和 reportSelectYear 部分，并在每个动态创建的 url 上调用 read_html()？

If you want all of 2017 and 2018, for example, you could do:例如，如果您想要所有 2017 年和 2018 年，您可以这样做：

library(rvest)
baseUrl <- "https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=reportsSearch.process&rptName=0&"
for (year in 2017:2018) {
    for (month in 1:12) {
        url <- paste0(baseUrl, "&reportSelectMonth=", month, "&reportSelectYear=", year)
        p <- read_html(url)
        # do stuff
    } 
}

使用下拉列表抓取历史数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-06 20:39:32

使用下拉列表抓取历史数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-06 20:39:32

解决方案1
1 已采纳 2019-10-06 20:39:32