简体   繁体   English

如何确定.NET HttpClient返回的内容是否为Gzip?

[英]How to determine whether content returned by .NET HttpClient is Gzipped?

I have a requirement to download some content from a remote URL and then also determine whether the content was compressed (Gzip or Deflate). 我需要从远程URL下载一些内容,然后确定内容是否已压缩(Gzip或Deflate)。

My issue is that when you allow the HttpClient to perform automatic decompression then it doesn't return any value in the response.Content.Headers.ContentEncoding property. 我的问题是,当您允许HttpClient执行自动解压缩时,它不会在response.Content.Headers.ContentEncoding属性中返回任何值。 If you don't enable automatic decompression then it does return the correct value for ContentEncoding but then you are left with a Gzipped document that hasn't been decompressed, which is not useful. 如果您没有启用自动解压缩,那么它会为ContentEncoding返回正确的值,但是您将留下一个尚未解压缩的Gzipped文档,这是无用的。

Take the following code: 请使用以下代码:

var handler = new HttpClientHandler()
{
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};

using (var client = new HttpClient(handler))
{
    client.DefaultRequestHeaders.Add("accept-encoding", "gzip, deflate");
    client.DefaultRequestHeaders.Add("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)");

    using (var message = new HttpRequestMessage(HttpMethod.Get, new Uri("https://www.twitter.com")))
    {
        using (var response = await client.SendAsync(message))
        {
            if (response.IsSuccessStatusCode)
            {
                string encoding = String.Join(",", response.Content.Headers.ContentEncoding);

                string content = await response.Content.ReadAsStringAsync();
            }
        }
    }
}

When the HttpClientHandler is set to use AutomaticDecompression then the value in content is successfully requested as GZip and then decompressed correctly. HttpClientHandler设置为使用AutomaticDecompression ,内容中的值成功请求为GZip,然后正确解压缩。 But the ContentEncoding value in the response headers collection is empty. 但响应头集合中的ContentEncoding值为空。

If I remove the line: 如果我删除该行:

AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate

then I do get the correct ContentEncoding value ("gzip") returned, but then the document is returned in it's raw compressed format, which is no good. 然后我得到正确的ContentEncoding值(“gzip”)返回,但然后文档以原始压缩格式返回,这是不好的。

So is there any way to get content that may sometimes (but not always) be GZipped and automatically decompress it when it is, but then know afterward whether it was originally sent as Gzip? 那么有没有什么方法可以获得有时(但不总是)GZip的内容,并在它出现时自动解压缩,但之后知道它是否最初是作为Gzip发送的?

Not a full answer, but I peeked through the source code of HttpClient and that led me to the code of the underlying HttpResponse . 不是一个完整的答案,但我浏览了HttpClient的源代码,这使我得到了底层HttpResponse的代码。 In there, you find this nugget: 在那里,你会发现这个金块:

  if ((decompressionMethod & DecompressionMethods.GZip) != DecompressionMethods.None && str.IndexOf("gzip", StringComparison.CurrentCulture) != -1)
  {
    this.m_ConnectStream = (Stream) new GZipWrapperStream(this.m_ConnectStream, CompressionMode.Decompress);
    this.m_ContentLength = -1L;
    this.m_HttpResponseHeaders["Content-Encoding"] = (string) null;
  }

As you can see, on the last line, they're removing that header altogether. 如您所见,在最后一行,他们完全删除了该标题。 I'm not entirely sure why that's what they decided to do, but it is what it is. 我不完全确定为什么他们决定这样做,但事实就是如此。

I guess your options are to either Unzip it yourself, or to make two requests (both of which aren't great options). 我想你的选择是要么自己解压缩,要么提出两个请求(两者都不是很好的选择)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM