如果返回內容是Transfer-Encoding：chunked，如何從HttpWebResponse獲取完整內容？

Question

我正在編寫一個從其他網站下載html頁面的程序。 我發現一個問題，對於某些特定的網站，我無法獲得完整的HTML代碼。 我只能獲得部分內容。 有這個問題的服務器在“Transfer-Encoding：chunked”中發送數據，恐怕這就是問題的原因。

這是服務器返回的頭信息：

Transfer-Encoding: chunked
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Type: text/html; charset=UTF-8
Date: Sun, 11 Sep 2011 09:46:23 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Server: nginx/1.0.6

這是我的代碼：

HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response;
CookieContainer cookie = new CookieContainer();
request.CookieContainer = cookie;
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent =
    @"Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 FirePHP/0.6";
request.Accept = @"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
string html = string.Empty;
response = request.GetResponse() as HttpWebResponse;

using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
    html = reader.ReadToEnd();
}

我只能獲得部分HTML代碼（我認為它是服務器中的第一個塊）。 有人可以幫忙嗎？ 任何方案？

謝謝！

Answer 1

您不能使用ReadToEnd來讀取分塊數據。 您需要使用GetBytes直接從響應流中讀取。

StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();

do
{
     count = resStream.Read(buf, 0, buf.Length);
     if(count != 0)
     {
          sb.Append(Encoding.UTF8.GetString(buf,0,count)); // just hardcoding UTF8 here
     }
}while (count > 0);
String html = sb.ToString();

Answer 2

如果我理解了你的要求，你可以逐行閱讀

string htmlLine = reader.ReadLine();

如果返回內容是Transfer-Encoding：chunked，如何從HttpWebResponse獲取完整內容？

問題描述

2 個解決方案

解決方案1
9 2011-11-12 09:25:45

解決方案2
-1 2011-11-12 09:07:24

如果返回內容是Transfer-Encoding：chunked，如何從HttpWebResponse獲取完整內容？

問題描述

2 個解決方案

解決方案1 9 2011-11-12 09:25:45

解決方案2 -1 2011-11-12 09:07:24

解決方案1
9 2011-11-12 09:25:45

解決方案2
-1 2011-11-12 09:07:24