简体   繁体   中英

Incomplete HttpWebResponse with large data sets

I have some code that downloads the content of a webpage that I've been using for a while. This code works fine and has never provided an issue and still doesn't... However, there is a page that is rather large (2MB, no images) with 4 tables with 4, 20, 100, 600 rows respectively and about 20 columns wide.

When trying to get all the data it completes without any apparent errors or exceptions but only returns up to about row 60 in the 4th table - sometimes more, sometimes less. The broswer completes loading in about 20-30 seconds with constant, what seems like flushes, to the page until complete.

I've tried a number of solutions from SO and searches without any different results. Below is the current code, but I've: proxy, async, no timeouts, false keepalive...

I can't use WebClient (as another far-fetch attempt) because I need to login using the cookiecontainer.

        HttpWebRequest pageImport = (HttpWebRequest)WebRequest.Create(importUri);
        pageImport.ReadWriteTimeout = Int32.MaxValue;
        pageImport.Timeout = Int32.MaxValue;
        pageImport.UserAgent = "User-Agent  Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3";
        pageImport.Accept = "Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        pageImport.KeepAlive = true;
        pageImport.Timeout = Int32.MaxValue;
        pageImport.ReadWriteTimeout = Int32.MaxValue;
        pageImport.MaximumResponseHeadersLength = Int32.MaxValue;

        if (null != LoginCookieContainer)
        {
            pageImport.CookieContainer = LoginCookieContainer;
        }

        Encoding encode = System.Text.Encoding.GetEncoding("utf-8");


        using (WebResponse response = pageImport.GetResponse())
        using (Stream stream = response.GetResponseStream())
        using (StreamReader reader = new StreamReader(stream, encode))
        {
            stream.Flush();
            HtmlRetrieved = reader.ReadToEnd();
        }

Try to read block wise instead of reader.ReadToEnd(); Just to give you an idea:

// Pipe the stream to a higher level stream reader with the required encoding format. StreamReader readStream = new StreamReader( ReceiveStream, encode ); Console.WriteLine("\nResponse stream received"); Char[] read = new Char[256];

    // Read 256 charcters at a time.    
 int count = readStream.Read( read, 0, 256 );
    Console.WriteLine("HTML...\r\n");

while (count > 0) 
{
        // Dump the 256 characters on a string and display the string onto the console.
    String str = new String(read, 0, count);
    Console.Write(str);
    count = readStream.Read(read, 0, 256);
}

I suspect this is handled as a configuration setting on the server side. Incidentally, I think you may be setting your properties incorrectly. Remove the "user-agent" and "accept" from the literals, as such:

pageImport.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3";         
pageImport.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";      

While I'm still going to try the suggestions provided and will change my answer if it works, it seems that in this case, the problem IS the proxy. I got in front of the proxy and the code works as expected and much quicker.

I'll have to look at some proxy optimizations since this code must run behind the proxy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM