[英]Getting HTML from a webpage after executing a javascript link
There is an internet page, that when you click a javascript link ( tag with javascript:... in it) loads a table. 有一个互联网页面,当您单击一个javascript链接(其中带有javascript:...的标记)时,会加载一个表格。 I need to get this table into my Asp.net website.
我需要将此表放入我的Asp.net网站。 There is no URL that contains the table without executing any scripts.
没有执行任何脚本的URL将不包含该表。 This is what I am currently using:
这是我目前正在使用的:
public string GetFromUrl(string path)
{
WebClient web = new WebClient();
return web.DownloadString(path);
}
public string GetTagHTML(string html)
{
Regex regex = new Regex("<table>(.*)</table>");
var v = regex.Match(html);
return v.Groups[1].ToString();
}
more info 更多信息
The website I am trying to get data from is http://beitbiram.iscool.co.il/default.aspx (it's in hebrew. The link I am trying to click is one of the table titles). 我试图从中获取数据的网站是http://beitbiram.iscool.co.il/default.aspx (希伯来语。我试图单击的链接是表格标题之一)。
The website is an asp.net website. 该网站是一个asp.net网站。
The function that the link calls is __doPostBack
. 链接调用的函数是
__doPostBack
。 I don't have any idea what it does, and can't find any online info about it, but this is it's code: 我不知道它做什么,也找不到关于它的任何在线信息,但这是代码:
var theForm = document.forms['Form'];
if (!theForm) {
theForm = document.Form;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
Thanks in advance. 提前致谢。
In general, the only way to get the HTML of a page after running Javascript is to run the Javascript, and that requires a browser. 通常,运行Javascript后获取页面HTML的唯一方法是运行Javascript,这需要浏览器。
The direct answer to your question, then, is to use something like Headless Chrome to spin up a browser, load the page, click the link, and export the HTML. 因此,您问题的直接答案是使用诸如Headless Chrome之类的工具启动浏览器,加载页面,单击链接并导出HTML。 This has historically been a massive pain to get working, although Headless Chrome is supposed to be rather less painful.
从历史上看,上班一直很痛苦,尽管Headless Chrome的痛苦要小得多。
However, the javascript:
link you run must get the data from somewhere in order to put it into the table, so I would strongly advise looking for that source and building the table yourself, because I certainly wouldn't want to be maintaining a website with an embedded browser if I didn't absolutely have to. 但是,您运行的
javascript:
链接必须从某处获取数据才能将其放入表中,因此我强烈建议您寻找该源并自行构建表,因为我当然不希望维护网站使用嵌入式浏览器,如果我不是绝对必要的话。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.