简体   繁体   English

C#Webbrowser控件无法获取生成的HTML源

[英]c# Webbrowser control cannot get generated HTML source

I am trying to get data (which is generated by scripts) and I am using webbrowser control applied the introduction from: C# webbrowser Ajax call 我正在尝试获取数据(由脚本生成),并且我正在使用webbrowser控件,该控件的介绍来自: C#webbrowser Ajax调用

My 1st main code is: 我的第一个主要代码是:

webBrowser1.Navigate("https://mobile.bet365.com/#type=Coupon;key=1-1-13-33977144-2-8-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0-0-0;ip=0;lng=1;anim=1");
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete)
{
    System.Threading.Thread.Sleep(10);
    Application.DoEvents();
}
File.WriteAllText(@"C:\pagesource.txt", webBrowser1.DocumentText);

The page source I got is not what the browser showed. 我得到的页面来源不是浏览器显示的内容。 When I modify the code like below: 当我修改如下代码时:

webBrowser1.Navigate("https://mobile.bet365.com/#type=Coupon;key=1-1-13-33977144-2-8-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0-0-0;ip=0;lng=1;anim=1");
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete)
{
    System.Threading.Thread.Sleep(10);
    Application.DoEvents();
}
MessageBox.Show("Loading completed");
File.WriteAllText(@"C:\pagesource.txt", webBrowser1.DocumentText);

and of course I have to press OK when the dialog is shown. 当然,当显示对话框时,我必须按OK。 The page source is correct now. 页面源现在是正确的。

I don't understand how can it be like that. 我不知道怎么会这样。 And I just want to get the page source automatically (without any clicks or user actions). 我只想自动获取页面源(无需任何点击或用户操作)。

Therefor webbrowser is not required I would try switching to a different method of obtaining the page source (also avoiding the overhead of the webbrowser control). 因此不需要webbrowser,我将尝试切换到获取页面源的另一种方法(也避免了webbrowser控件的开销)。

Please note, that reading HTML source is very hard - as soon as the page layout is changed or additional javascript scripts kick in you can get into problems. 请注意,阅读HTML源代码非常困难-一旦页面布局发生更改或其他JavaScript脚本启动,您可能会遇到问题。 For retrieving data from web pages you should search for a rss feed eg. 要从网页中检索数据,您应该搜索rss feed,例如。 which you can parse better than the html page source. 您可以比html页面源代码更好地进行解析。

However I could not test my following code due to your mentioned url is currently undergoing maintenance. 但是,由于您提到的网址正在接受维护,因此我无法测试以下代码。 I tested it again my own page and it worked there. 我在自己的页面上再次对其进行了测试,并且在那里工作了。 Naturally, on my own page there is not so much javascript like on your url. 自然,在我自己的页面上,您的URL上没有太多的JavaScript。

Below I have shown 3 different methods of obtaining the page source: 下面,我展示了获取页面源的3种不同方法:

        string pageSource1 = null, pageSource2 = null, pageSource3 = null;
        try
        {
            using (System.Net.WebClient webClient = new System.Net.WebClient())
            {
                // perhaps fake user agent?
                webClient.Headers.Add("USER_AGENT", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36 OPR/51.0.2830.55");

                //
                // option 1: using webclient download string (simple call)
                pageSource1 = webClient.DownloadString(url);

                //
                // option 2: getting a stream... (if you prefer using a stream, eg. not reading the whole page until the end)
                var webClientStream = webClient.OpenRead(url);
                if (webClientStream != null)
                {
                    using (System.IO.StreamReader streamReader = new System.IO.StreamReader(webClientStream))
                    {
                        pageSource2 = streamReader.ReadToEnd();
                    }
                }
            }

            //
            // option3: using webrequest (with webrequest/webresponse you can rebuild the browser behavior eg. walking pages)
            System.Net.WebRequest webRequest = System.Net.WebRequest.Create(url);
            webRequest.Method = "GET";

            var webResponse = webRequest.GetResponse();
            var webResponseStream = webResponse.GetResponseStream();
            if (webResponseStream != null)
            {
                using (System.IO.StreamReader streamReader = new System.IO.StreamReader(webResponseStream))
                {
                    pageSource3 = streamReader.ReadToEnd();
                }
            }
        }
        catch (System.Net.WebException exc)// for web
        {
            Console.WriteLine($"Unable to download page source: {exc.Message}");
            // todo - safely handle...
        }
        catch (System.IO.IOException exc)//for stream
        {
            Console.WriteLine($"Unable to download page source: {exc.Message}");
            // todo - safely handle...
        }

Hope it does help you! 希望对您有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM