[英]Javascript html grab from external iframe + calling a controller action with data
[英]R get html data from a javascript action
我想从需要单击按钮(javascript)的页面上抓取一些数据,以使我能够访问表格。
当您在http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/上时,您可以使用左侧的小“表”按钮访问地图和数据表。
它将打开一个包含结果的新窗口,我想在R中获得此结果。此新页面的URL为http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/ table.html?th0,但如果我不是来自地图页面,则无法访问该页面。
因此,我想知道是否有可能用R模拟产生与单击此按钮以获得相同数据效果相同的效果。
我努力了
path<-"http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/table.html?th0"
webpage <- getURL(path)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
但结果显然不起作用
[1] "<!DOCTYPE HTML>"
[2] "<html>"
[3] "<meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" />"
[4] "<link rel=\"stylesheet\" href=\"style.css\" />"
[5] "<link rel=\"stylesheet\" href=\"rectable.css\" />"
[6] "<script language=\"JavaScript\" type=\"text/javascript\">"
[7] "<!--"
[8] "function sortTable(theColumn,datatype,orderby) {"
[9] " document.getElementById(\"content\").innerHTML = \"Veuillez patientez ...\";"
[10] " var themaId = window.location.search.substr(1,window.location.search.length);"
[11] " var xslFile = \"styletable.xsl\";"
[12] " window.opener.mv_loadAttrTableFile(themaId,true);"
[13] " try {"
[14] "\ttry {"
[15] " var xslt = new ActiveXObject(\"Msxml2.XSLTemplate.4.0\");"
[16] " var xslDoc = new ActiveXObject(\"Msxml2.FreeThreadedDOMDocument.4.0\");"
[17] " } catch(e) {"
[18] " var xslt = new ActiveXObject(\"Msxml2.XSLTemplate\");"
[19] " var xslDoc = new ActiveXObject(\"Msxml2.FreeThreadedDOMDocument\");"
[20] " }"
[21] " xslDoc.async = false;"
[22] " xslDoc.resolveExternals = false;"
[23] " xslDoc.load(xslFile);"
[24] " xslt.stylesheet = xslDoc;"
[25] " var xslProc = xslt.createProcessor();"
[26] " xslProc.input = window.opener.mv_XMLFileArray[themaId].XMLFile;"
[27] " if (theColumn) {"
[28] " xslProc.addParameter(\"field\",\"f\" + (parseInt(theColumn) - 1));"
[29] " xslProc.addParameter(\"datatype\",datatype);"
[30] " xslProc.addParameter(\"orderby\",orderby);"
[31] " }"
[32] " xslProc.transform();"
[33] " content.innerHTML = xslProc.output;"
[34] " } "
[35] ""
[36] " catch(e) {"
[37] " var xsltProcessor = new XSLTProcessor(); "
[38] " var xslStylesheet = window.opener.mv_loadXMLDoc(window.opener.mv_Doc.BaseURL + \"embfiles/\" + xslFile,\"xml\");"
[39] " try {"
[40] " xsltProcessor.importStylesheet(xslStylesheet);"
[41] " }"
[42] " catch(err) {"
[43] " var xslStylesheet = document.implementation.createDocument(\"\", \"\", null);"
[44] " xslStylesheet.async = false;"
[45] " xslStylesheet.load(xslFile);"
[46] " xsltProcessor.importStylesheet(xslStylesheet);"
[47] " }"
[48] " if (theColumn) {"
[49] " xsltProcessor.setParameter(null,\"field\",\"f\" + (parseInt(theColumn) - 1));"
[50] " xsltProcessor.setParameter(null,\"datatype\",datatype);"
[51] " xsltProcessor.setParameter(null,\"orderby\",orderby);"
[52] " }"
[53] " var resultFragment = xsltProcessor.transformToFragment(window.opener.mv_XMLFileArray[themaId].XMLFile,document);"
[54] " document.getElementById(\"content\").innerHTML = \"\";"
[55] " document.getElementById(\"content\").appendChild(resultFragment);"
[56] " }"
[57] "}"
[58] "//-->"
[59] "</script>"
[60] "<title>Table attributaire</title>"
[61] "</head>"
[62] "<body onload=\"sortTable();\">"
[63] "<div id=\"content\">Veuillez patientez ...</div>"
[64] "</body>"
[65] "</html>"
[66] ""
有任何想法吗 ?
谢谢
您可以使用Chrome中的“检查元素”工具来帮助您确定单击表格按钮会触发哪些类型的呼叫。
您可以使用此ajex调用轻松检索这些数据。
http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/th0.xml
然后,您可以从那里开始解析html。
要解析xml或html, XML
将是一个有用的工具。 这是如何根据所需元素的xpath获取标题的POC。
> library(XML)
> library(RCurl)
> url <- "http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/th0.xml"
> doc = htmlTreeParse(url, useInternalNodes = T)
> title <- xpathSApply(doc, "//title[@id='titth0']", fun=xmlValue)
> title
[1] "Quantité livrée à la cave coopérative (hl)"
用于抓取的Python BeautifulSoup:
from bs4 import BeautifulSoup
import urllib2
url = "http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/th0.xml"
soup = BeautifulSoup(urllib2.urlopen(url))
f0s = soup.find_all('f0')
for f0 in f0s:
print f0.text
输出:
Commune
07- BOURG-SAINT-ANDEOL
07- VILLENEUVE-DE-BERG
07- LABLACHERE
...
07- BERRIAS-ET-CASTELJAU
07- BESSAS
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.