简体   繁体   English

C#使用javascript解析html

[英]C# parsing html with javascript

I need to parse html code after executing javascript code inside this document. 在执行本文档中的javascript代码后,我需要解析html代码。 I use webBrowser control for downloading and controling html. 我使用webBrowser控件下载和控制html。

For example, I have some javascript in my html code. 例如,我的html代码中有一些javascript。

<script type="text/javascript" src="http://site.com/script.js"></script>

Thank for your answers. 感谢您的回答。

PS I mean: I must parse all code with some text wich can return javascript. PS我的意思是:我必须解析所有代码,其中一些文本可以返回javascript。 So, I can parse document only after execution javascript. 因此,我只能在执行javascript之后解析文档。 Becouse I need some part of dinamic content wich will be added with javascript. 因为我需要将某些动态内容添加到javascript中。

Added 添加

I got content with javascript generated content. 我对javascript生成的内容感到满意。 I skipped this one, because I was looking for some content that was in iframe which was generated with javascript. 我跳过了这一部分,因为我一直在寻找使用JavaScript生成的iframe中的某些内容。

And now I have another question. 现在我还有另一个问题。 In my document I have few iframes. 在我的文档中,我没有几个iframe。 I am trying to get content from some frames. 我正在尝试从某些框架中获取内容。 In the next way: 在下一种方式中:

        var htmlcol = webBrowser1.Document.Window.Frames;
        foreach (HtmlWindow item in htmlcol)
        {
            try
            {
                Console.Write(item.Name);
            }
            catch (System.Exception ex)
            {
                MessageBox.Show("Something wrong");
            }

        }

But in this way I have exception: ' System.UnauthorizedAccessException '. 但是以这种方式,我有一个例外:' System.UnauthorizedAccessException '。 How I can get access to html of frames? 我如何获得对HTML框架的访问?

PPS Sory for my bad english :) PPS Sory我的英语不好:)

I think that you will have a better experience using the DOM as represented using the Document property of the WebBrowser . 我认为,使用WebBrowserDocument属性表示的DOM会带来更好的体验。

You can either traverse the nested elements of Body , or find what you want using GetElementById or GetElementsByTagName . 您可以遍历Body的嵌套元素,或者使用GetElementByIdGetElementsByTagName查找所需的内容。

The DOM should be automatically updated by the changes the JavaScript makes in the page. DOM应该通过JavaScript在页面中所做的更改自动更新。

Please read Phantomjs for your issue and use setTimeOut for page open. 请阅读您的问题的Phantomjs,并使用setTimeOut打开页面。

This can loke like this: 这可能是这样的:

var page = require('webpage').create();

page.open("https://sample.com", function(){
    page.evaluate(function(){
        // Execution somethings before page load. for Example: 
        localStorage.setItem("something", "whatever");// Set LocalStorage for browser before open
    });

    page.open("https://sample.com", function(){
        setTimeout(function(){
            console.log(page.content); //page source

            // Where you want to save it    
            page.render("screenshoot.png")  

            // You can access its content using jQuery
            var fbcomments = page.evaluate(function(){
                return $("body").contents().find(".content") 
            }) 

            phantom.exit();
        },10000)
    });    
});

Try the following: - Add reference Microsoft.mshtml to your application. 请尝试以下操作:-将引用Microsoft.mshtml添加到您的应用程序。

Try: 尝试:

public void setPage(mshtml.HTMLWindow2Class JSFile)
{
HTMLWindow2Class window = new HTMLWindow2Class();
window = JSFile;

}
public void scriptPrint()
{
IHTMLDocument2 doc = null; ;
IHTMLWindow2 parentwindow = doc.parentWindow;



parentwindow.execScript("report_back('Printing complete!')", "JScript");
}

}

Here's also an article that might help you: http://www.dotnetcurry.com/ShowArticle.aspx?ID=194 这也是一篇可能对您有帮助的文章: http : //www.dotnetcurry.com/ShowArticle.aspx?ID=194

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM