简体   繁体   English

执行JavaScript链接后从网页获取HTML

[英]Getting HTML from a webpage after executing a javascript link

There is an internet page, that when you click a javascript link ( tag with javascript:... in it) loads a table. 有一个互联网页面,当您单击一个javascript链接(其中带有javascript:...的标记)时,会加载一个表格。 I need to get this table into my Asp.net website. 我需要将此表放入我的Asp.net网站。 There is no URL that contains the table without executing any scripts. 没有执行任何脚本的URL将不包含该表。 This is what I am currently using: 这是我目前正在使用的:

public string GetFromUrl(string path)
{
    WebClient web = new WebClient();
    return web.DownloadString(path);
}

public string GetTagHTML(string html)
{
    Regex regex = new Regex("<table>(.*)</table>");
    var v = regex.Match(html);
    return v.Groups[1].ToString();
}

more info 更多信息

The website I am trying to get data from is http://beitbiram.iscool.co.il/default.aspx (it's in hebrew. The link I am trying to click is one of the table titles). 我试图从中获取数据的网站是http://beitbiram.iscool.co.il/default.aspx (希伯来语。我试图单击的链接是表格标题之一)。

The website is an asp.net website. 该网站是一个asp.net网站。

The function that the link calls is __doPostBack . 链接调用的函数是__doPostBack I don't have any idea what it does, and can't find any online info about it, but this is it's code: 我不知道它做什么,也找不到关于它的任何在线信息,但这是代码:

var theForm = document.forms['Form'];
if (!theForm) {
    theForm = document.Form;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}

Thanks in advance. 提前致谢。

In general, the only way to get the HTML of a page after running Javascript is to run the Javascript, and that requires a browser. 通常,运行Javascript后获取页面HTML的唯一方法是运行Javascript,这需要浏览器。

The direct answer to your question, then, is to use something like Headless Chrome to spin up a browser, load the page, click the link, and export the HTML. 因此,您问题的直接答案是使用诸如Headless Chrome之类的工具启动浏览器,加载页面,单击链接并导出HTML。 This has historically been a massive pain to get working, although Headless Chrome is supposed to be rather less painful. 从历史上看,上班一直很痛苦,尽管Headless Chrome的痛苦要小得多。

However, the javascript: link you run must get the data from somewhere in order to put it into the table, so I would strongly advise looking for that source and building the table yourself, because I certainly wouldn't want to be maintaining a website with an embedded browser if I didn't absolutely have to. 但是,您运行的javascript:链接必须从某处获取数据才能将其放入表中,因此我强烈建议您寻找该源并自行构建表,因为我当然不希望维护网站使用嵌入式浏览器,如果我不是绝对必要的话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM