[英]c# Webbrowser control cannot get generated HTML source
I am trying to get data (which is generated by scripts) and I am using webbrowser control applied the introduction from: C# webbrowser Ajax call 我正在尝试获取数据(由脚本生成),并且我正在使用webbrowser控件,该控件的介绍来自: C#webbrowser Ajax调用
My 1st main code is: 我的第一个主要代码是:
webBrowser1.Navigate("https://mobile.bet365.com/#type=Coupon;key=1-1-13-33977144-2-8-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0-0-0;ip=0;lng=1;anim=1");
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete)
{
System.Threading.Thread.Sleep(10);
Application.DoEvents();
}
File.WriteAllText(@"C:\pagesource.txt", webBrowser1.DocumentText);
The page source I got is not what the browser showed. 我得到的页面来源不是浏览器显示的内容。 When I modify the code like below:
当我修改如下代码时:
webBrowser1.Navigate("https://mobile.bet365.com/#type=Coupon;key=1-1-13-33977144-2-8-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0-0-0;ip=0;lng=1;anim=1");
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete)
{
System.Threading.Thread.Sleep(10);
Application.DoEvents();
}
MessageBox.Show("Loading completed");
File.WriteAllText(@"C:\pagesource.txt", webBrowser1.DocumentText);
and of course I have to press OK when the dialog is shown. 当然,当显示对话框时,我必须按OK。 The page source is correct now.
页面源现在是正确的。
I don't understand how can it be like that. 我不知道怎么会这样。 And I just want to get the page source automatically (without any clicks or user actions).
我只想自动获取页面源(无需任何点击或用户操作)。
Therefor webbrowser is not required I would try switching to a different method of obtaining the page source (also avoiding the overhead of the webbrowser control). 因此不需要webbrowser,我将尝试切换到获取页面源的另一种方法(也避免了webbrowser控件的开销)。
Please note, that reading HTML source is very hard - as soon as the page layout is changed or additional javascript scripts kick in you can get into problems. 请注意,阅读HTML源代码非常困难-一旦页面布局发生更改或其他JavaScript脚本启动,您可能会遇到问题。 For retrieving data from web pages you should search for a rss feed eg.
要从网页中检索数据,您应该搜索rss feed,例如。 which you can parse better than the html page source.
您可以比html页面源代码更好地进行解析。
However I could not test my following code due to your mentioned url is currently undergoing maintenance. 但是,由于您提到的网址正在接受维护,因此我无法测试以下代码。 I tested it again my own page and it worked there.
我在自己的页面上再次对其进行了测试,并且在那里工作了。 Naturally, on my own page there is not so much javascript like on your url.
自然,在我自己的页面上,您的URL上没有太多的JavaScript。
Below I have shown 3 different methods of obtaining the page source: 下面,我展示了获取页面源的3种不同方法:
string pageSource1 = null, pageSource2 = null, pageSource3 = null;
try
{
using (System.Net.WebClient webClient = new System.Net.WebClient())
{
// perhaps fake user agent?
webClient.Headers.Add("USER_AGENT", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36 OPR/51.0.2830.55");
//
// option 1: using webclient download string (simple call)
pageSource1 = webClient.DownloadString(url);
//
// option 2: getting a stream... (if you prefer using a stream, eg. not reading the whole page until the end)
var webClientStream = webClient.OpenRead(url);
if (webClientStream != null)
{
using (System.IO.StreamReader streamReader = new System.IO.StreamReader(webClientStream))
{
pageSource2 = streamReader.ReadToEnd();
}
}
}
//
// option3: using webrequest (with webrequest/webresponse you can rebuild the browser behavior eg. walking pages)
System.Net.WebRequest webRequest = System.Net.WebRequest.Create(url);
webRequest.Method = "GET";
var webResponse = webRequest.GetResponse();
var webResponseStream = webResponse.GetResponseStream();
if (webResponseStream != null)
{
using (System.IO.StreamReader streamReader = new System.IO.StreamReader(webResponseStream))
{
pageSource3 = streamReader.ReadToEnd();
}
}
}
catch (System.Net.WebException exc)// for web
{
Console.WriteLine($"Unable to download page source: {exc.Message}");
// todo - safely handle...
}
catch (System.IO.IOException exc)//for stream
{
Console.WriteLine($"Unable to download page source: {exc.Message}");
// todo - safely handle...
}
Hope it does help you! 希望对您有帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.