使用JavaScript鏈接抓取網頁

Question

我正在使用R進行網頁抓取。 我需要的信息在此網頁的鏈接中。 但是，當我單擊時，鏈接將轉到我所在的頁面。 在獲得包含所需信息的表格之前，如何在其他鏈接后面抓取信息？ 幾個月前，我開始使用R，我知道httr，Curl和其他軟件包，但是我無法抓取此網頁。 我需要這樣的輸出（通過單擊“ Todo el territorio”和Tipo de estudios：“ Bachillerato”）：

Provincia|Localidad|Denominacion Generica|Denominacion Especifica|Codigo|Naturaleza
Almería|Adra|Instituto de Educación Secundaria|Abdera|04000110|Centro público
Almería|Adra|Instituto de Educación Secundaria|Gaviota|04000134|Centro público

...

這將是我使用Selenium軟件包的常規腳本，但它不起作用，我接受任何選項：

library(RSelenium)
library(XML)
library(magrittr)

RSelenium::checkForServer()
RSelenium::startServer()
remDrv <- RSelenium::remoteDriver(remoteServerAddr = "localhost", port = 4444, browserName = "chrome")
remDrv$open()

remDrv$navigate('https://www.educacion.gob.es/centros/selectaut.do')
remDrv$findElement(using = "xpath", "//select[@name = '.listado-inicio']/option[@value = ('02','00')]")$clickElement()

...

或類似的東西。 我發現類似此腳本的東西在stackoverflow中尋找其他主題，但我什么也沒得到。 我接受其他腳本提供的其他解決方案。 非常感謝。

Answer 1

使用“ RSelenium”瀏覽站點，您可以執行以下操作：

library(RSelenium)
library(rvest)
#start RSelenium
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()

remDr$navigate('https://www.educacion.gob.es/centros/selectaut.do')

#Click on the todo el territorio link
remDr$findElement(using = "xpath", "//a[text()='Todo el territorio']")$clickElement()

#select the Bachillerato option (has a value of 133) and click on the search button
remDr$findElement(using = "xpath", "//select[@id='comboniv']/option[@value='133']")$clickElement()
remDr$findElement(using = "xpath", "//input[@id='idGhost']")$clickElement()

#Click on the show results button
remDr$findElement(using = "xpath", "//input[@title='Buscar']")$clickElement()

#parse the html and get the table
doc <- htmlParse(remDr$getPageSource()[[1]],encoding="UTF-8")
data <- readHTMLTable(doc)$matcentro

使用JavaScript鏈接抓取網頁

問題描述

1 個解決方案

解決方案1
1 已采納 2015-05-12 16:18:50

使用JavaScript鏈接抓取網頁

問題描述

1 個解決方案

解決方案1 1 已采納 2015-05-12 16:18:50

解決方案1
1 已采納 2015-05-12 16:18:50