[英]Javascript html grab from external iframe + calling a controller action with data
[英]R get html data from a javascript action
我想從需要單擊按鈕(javascript)的頁面上抓取一些數據,以使我能夠訪問表格。
當您在http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/上時,您可以使用左側的小“表”按鈕訪問地圖和數據表。
它將打開一個包含結果的新窗口,我想在R中獲得此結果。此新頁面的URL為http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/ table.html?th0,但如果我不是來自地圖頁面,則無法訪問該頁面。
因此,我想知道是否有可能用R模擬產生與單擊此按鈕以獲得相同數據效果相同的效果。
我努力了
path<-"http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/table.html?th0"
webpage <- getURL(path)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
但結果顯然不起作用
[1] "<!DOCTYPE HTML>"
[2] "<html>"
[3] "<meta http-equiv=\"Content-type\" content=\"text/html; charset=UTF-8\" />"
[4] "<link rel=\"stylesheet\" href=\"style.css\" />"
[5] "<link rel=\"stylesheet\" href=\"rectable.css\" />"
[6] "<script language=\"JavaScript\" type=\"text/javascript\">"
[7] "<!--"
[8] "function sortTable(theColumn,datatype,orderby) {"
[9] " document.getElementById(\"content\").innerHTML = \"Veuillez patientez ...\";"
[10] " var themaId = window.location.search.substr(1,window.location.search.length);"
[11] " var xslFile = \"styletable.xsl\";"
[12] " window.opener.mv_loadAttrTableFile(themaId,true);"
[13] " try {"
[14] "\ttry {"
[15] " var xslt = new ActiveXObject(\"Msxml2.XSLTemplate.4.0\");"
[16] " var xslDoc = new ActiveXObject(\"Msxml2.FreeThreadedDOMDocument.4.0\");"
[17] " } catch(e) {"
[18] " var xslt = new ActiveXObject(\"Msxml2.XSLTemplate\");"
[19] " var xslDoc = new ActiveXObject(\"Msxml2.FreeThreadedDOMDocument\");"
[20] " }"
[21] " xslDoc.async = false;"
[22] " xslDoc.resolveExternals = false;"
[23] " xslDoc.load(xslFile);"
[24] " xslt.stylesheet = xslDoc;"
[25] " var xslProc = xslt.createProcessor();"
[26] " xslProc.input = window.opener.mv_XMLFileArray[themaId].XMLFile;"
[27] " if (theColumn) {"
[28] " xslProc.addParameter(\"field\",\"f\" + (parseInt(theColumn) - 1));"
[29] " xslProc.addParameter(\"datatype\",datatype);"
[30] " xslProc.addParameter(\"orderby\",orderby);"
[31] " }"
[32] " xslProc.transform();"
[33] " content.innerHTML = xslProc.output;"
[34] " } "
[35] ""
[36] " catch(e) {"
[37] " var xsltProcessor = new XSLTProcessor(); "
[38] " var xslStylesheet = window.opener.mv_loadXMLDoc(window.opener.mv_Doc.BaseURL + \"embfiles/\" + xslFile,\"xml\");"
[39] " try {"
[40] " xsltProcessor.importStylesheet(xslStylesheet);"
[41] " }"
[42] " catch(err) {"
[43] " var xslStylesheet = document.implementation.createDocument(\"\", \"\", null);"
[44] " xslStylesheet.async = false;"
[45] " xslStylesheet.load(xslFile);"
[46] " xsltProcessor.importStylesheet(xslStylesheet);"
[47] " }"
[48] " if (theColumn) {"
[49] " xsltProcessor.setParameter(null,\"field\",\"f\" + (parseInt(theColumn) - 1));"
[50] " xsltProcessor.setParameter(null,\"datatype\",datatype);"
[51] " xsltProcessor.setParameter(null,\"orderby\",orderby);"
[52] " }"
[53] " var resultFragment = xsltProcessor.transformToFragment(window.opener.mv_XMLFileArray[themaId].XMLFile,document);"
[54] " document.getElementById(\"content\").innerHTML = \"\";"
[55] " document.getElementById(\"content\").appendChild(resultFragment);"
[56] " }"
[57] "}"
[58] "//-->"
[59] "</script>"
[60] "<title>Table attributaire</title>"
[61] "</head>"
[62] "<body onload=\"sortTable();\">"
[63] "<div id=\"content\">Veuillez patientez ...</div>"
[64] "</body>"
[65] "</html>"
[66] ""
有任何想法嗎 ?
謝謝
您可以使用Chrome中的“檢查元素”工具來幫助您確定單擊表格按鈕會觸發哪些類型的呼叫。
您可以使用此ajex調用輕松檢索這些數據。
http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/th0.xml
然后,您可以從那里開始解析html。
要解析xml或html, XML
將是一個有用的工具。 這是如何根據所需元素的xpath獲取標題的POC。
> library(XML)
> library(RCurl)
> url <- "http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/th0.xml"
> doc = htmlTreeParse(url, useInternalNodes = T)
> title <- xpathSApply(doc, "//title[@id='titth0']", fun=xmlValue)
> title
[1] "Quantité livrée à la cave coopérative (hl)"
用於抓取的Python BeautifulSoup:
from bs4 import BeautifulSoup
import urllib2
url = "http://www.si-vitifrance.com/docs/cvi/cvi13/cartes_inter/c_vin01_coop_com07/embfiles/th0.xml"
soup = BeautifulSoup(urllib2.urlopen(url))
f0s = soup.find_all('f0')
for f0 in f0s:
print f0.text
輸出:
Commune
07- BOURG-SAINT-ANDEOL
07- VILLENEUVE-DE-BERG
07- LABLACHERE
...
07- BERRIAS-ET-CASTELJAU
07- BESSAS
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.