简体   繁体   English

R中包含下拉菜单的刮痧页面

[英]Scraping page with drop-down menus in R

I am trying to use the Selenium package in R to scrape the following page: http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx . 我正在尝试使用R中的Selenium包来刮除以下页面: http ://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/ DetailedResult /Detailed_gp.aspx。 I am interested in all combinations of the drop-downs selected but keep getting the 我对所选择的下拉列表的所有组合感兴趣,但仍然保持得到

Couldnt connect to host on http://localhost:4444/wd/hub.Please ensure a Selenium server is running.
Error in queryRD(paste0(serverURL, "/session"), "POST", qdata = toJSON(serverOpts)) :

 library(RSelenium)
 library(XML)
 library(magrittr)

 checkForServer()
 startServer()
 remDrv<-remoteDriver()
 remDrv$open()
 remDrv$navigate("http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx")

Any help would be appreciated. 任何帮助,将不胜感激。

Use an intermediary such as burpsuite to capture what's going on and use the results in combination with rvest 's html_session and/or httr 's POST . 使用像burpsuite这样的中间人来捕捉正在发生的事情并将结果与rvesthtml_session和/或httrPOST

In this case, you'd see your original URL contains the initial <select> menu and you'd also see that selecting one issues a POST to: 在这种情况下,您会看到您的原始网址包含初始<select>菜单,您还会看到选择一个发布POST

http://www.wbsec.gov.in/(S(njkinc55hbv2hw55xksxdv45))/DetailedResult/Detailed_gp.aspx

with a number of the hidden variables in the original form element as well as ddldistrict , ddlblock and ddlgp . 在原始表单元素以及ddldistrictddlblockddlgp包含许多隐藏变量。 The response contains the subsequent <select> menu options. 响应包含后续的<select>菜单选项。

Use rvest to get the value attribute of each dropdown and make subsequent POST s to the Detailed_gp.aspx URL until you've got all the combinations. 使用rvest获取每个下拉列表的value属性,然后将后续POST发送到Detailed_gp.aspx URL,直到获得所有组合。

You'll probably get a Selenium answer, but this problem only requires posting to forms, which is something httr and rvest excel at. 你可能会得到一个Selenium答案,但这个问题只需要发布到表单,这是httrrvest擅长的。

You don't seem to have set up Selenium properly. 您似乎没有正确设置Selenium。 Make sure you have Selenium downloaded and R Selenium loaded in R. This link might be helpful. 确保已下载Selenium并在R中加载R Selenium。 此链接可能会有所帮助。

Once Selenium is set up properly, all you have to do is find the css selectors ( selectorgadget is a great tool for this), and send the required information to the dropdowns, scrape the website and repeat. 一旦Selenium设置正确,你所要做的就是找到css选择器( selectorgadget是一个很好的工具),并将所需信息发送到下拉列表,刮取网站并重复。 I would do three dropdowns. 我会做三个下拉菜单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM