简体   繁体   English

具有大型数据集的不完整 HttpWebResponse

[英]Incomplete HttpWebResponse with large data sets

I have some code that downloads the content of a webpage that I've been using for a while.我有一些代码可以下载我已经使用了一段时间的网页内容。 This code works fine and has never provided an issue and still doesn't... However, there is a page that is rather large (2MB, no images) with 4 tables with 4, 20, 100, 600 rows respectively and about 20 columns wide.这段代码工作正常,从来没有提供过问题,仍然没有......但是,有一个相当大的页面(2MB,没有图像)有 4 个表,分别有 4、20、100、600 行和大约 20列宽。

When trying to get all the data it completes without any apparent errors or exceptions but only returns up to about row 60 in the 4th table - sometimes more, sometimes less.当尝试获取所有数据时,它会在没有任何明显错误或异常的情况下完成,但最多只能返回第 4 个表中的第 60 行 - 有时更多,有时更少。 The broswer completes loading in about 20-30 seconds with constant, what seems like flushes, to the page until complete.浏览器在大约 20-30 秒内完成加载,并不断地(看起来像是刷新)到页面,直到完成。

I've tried a number of solutions from SO and searches without any different results.我已经尝试了一些 SO 和搜索的解决方案,但没有任何不同的结果。 Below is the current code, but I've: proxy, async, no timeouts, false keepalive...下面是当前代码,但我有:代理、异步、无超时、错误的 keepalive ......

I can't use WebClient (as another far-fetch attempt) because I need to login using the cookiecontainer.我不能使用 WebClient(作为另一个远程尝试),因为我需要使用 cookiecontainer 登录。

        HttpWebRequest pageImport = (HttpWebRequest)WebRequest.Create(importUri);
        pageImport.ReadWriteTimeout = Int32.MaxValue;
        pageImport.Timeout = Int32.MaxValue;
        pageImport.UserAgent = "User-Agent  Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3";
        pageImport.Accept = "Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        pageImport.KeepAlive = true;
        pageImport.Timeout = Int32.MaxValue;
        pageImport.ReadWriteTimeout = Int32.MaxValue;
        pageImport.MaximumResponseHeadersLength = Int32.MaxValue;

        if (null != LoginCookieContainer)
        {
            pageImport.CookieContainer = LoginCookieContainer;
        }

        Encoding encode = System.Text.Encoding.GetEncoding("utf-8");


        using (WebResponse response = pageImport.GetResponse())
        using (Stream stream = response.GetResponseStream())
        using (StreamReader reader = new StreamReader(stream, encode))
        {
            stream.Flush();
            HtmlRetrieved = reader.ReadToEnd();
        }

Try to read block wise instead of reader.ReadToEnd();尝试逐块阅读而不是 reader.ReadToEnd(); Just to give you an idea:只是给你一个想法:

// Pipe the stream to a higher level stream reader with the required encoding format. // Pipe stream 到更高级别的 stream 读卡器所需的编码格式。 StreamReader readStream = new StreamReader( ReceiveStream, encode ); StreamReader readStream = new StreamReader( ReceiveStream, encode ); Console.WriteLine("\nResponse stream received"); Console.WriteLine("\n收到响应 stream"); Char[] read = new Char[256];字符 [] 读取 = 新字符 [256];

    // Read 256 charcters at a time.    
 int count = readStream.Read( read, 0, 256 );
    Console.WriteLine("HTML...\r\n");

while (count > 0) 
{
        // Dump the 256 characters on a string and display the string onto the console.
    String str = new String(read, 0, count);
    Console.Write(str);
    count = readStream.Read(read, 0, 256);
}

I suspect this is handled as a configuration setting on the server side.我怀疑这是作为服务器端的配置设置处理的。 Incidentally, I think you may be setting your properties incorrectly.顺便说一句,我认为您可能错误地设置了属性。 Remove the "user-agent" and "accept" from the literals, as such:从文字中删除“用户代理”和“接受”,如下所示:

pageImport.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3";         
pageImport.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";      

While I'm still going to try the suggestions provided and will change my answer if it works, it seems that in this case, the problem IS the proxy.虽然我仍然会尝试提供的建议,如果可行,我会更改我的答案,但似乎在这种情况下,问题是代理。 I got in front of the proxy and the code works as expected and much quicker.我走到代理前面,代码按预期工作,而且速度更快。

I'll have to look at some proxy optimizations since this code must run behind the proxy.我将不得不查看一些代理优化,因为此代码必须在代理后面运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM