简体   繁体   English

如何处理传输编码:使用 .NET Core HttpClient.PostAsync 下载文件时分块

[英]How to handle Transfer-Encoding: chunked when downloading a file with .NET Core HttpClient.PostAsync

Situation情况

I am using HttpClient (System.Net.Http, Version=4.2.1.0) to POST an HTTP request with multipart form data to a web API.我正在使用 HttpClient (System.Net.Http, Version=4.2.1.0) 将带有多部分表单数据的 HTTP 请求发布到 Web API。 The form data includes a string parameter ( benchmark ) and a file ( addressFile ) which is contained in stream .表单数据包括一个字符串参数( benchmark )和一个包含在stream的文件( addressFile )。 The API call returns a CSV file which I want to save to disk. API 调用返回一个我想保存到磁盘的 CSV 文件。

The response contains the header Transfer-Encoding: chunked and the data contained in responseBytes includes the chunk headers.响应包含头Transfer-Encoding: chunked并且responseBytes包含的数据包括块头。 I would expect the HttpClient library to strip out these headers, which are metadata for the actual content.希望HttpClient 库去除这些标头,它们是实际内容的元数据 Instead, it simply includes the header rows in the Content .相反,它只是在Content包含标题行。

Question

What is the correct way to handle these chunk headers?处理这些块头的正确方法是什么?

I could of course write a method to handle the headers myself, but I find it hard to believe that the HttpClient library doesn't already have this functionality baked in somewhere.我当然可以编写一个方法来自己处理标头,但我发现很难相信 HttpClient 库还没有在某处烘焙此功能。

Code代码

using (var client = new HttpClient())
        {
            var content = new MultipartFormDataContent();
            content.Add(new StringContent("Public_AR_Current"), "benchmark");
            content.Add(new ByteArrayContent(stream.ToArray()), "addressFile", "addressFile.csv");

            var response = await client.PostAsync("https://geocoding.geo.census.gov/geocoder/locations/addressbatch", content);

            var responseBytes = await response.Content.ReadAsByteArrayAsync();
            saveResponse(responseBytes);

            var geocodedItems = ParseGeocodeResponse(responseBytes);
            var parsedItems = geocodedItems.Select(gi => gi.ToEpaHandlerUsCensusGeocode());
            return parsedItems;
        }

Result结果

Note the chunk header on the first and subsequent lines ( 0fe8 , 0060 , 0fe8 ).请注意第一行和后续行( 0fe800600fe8 )上的0fe8

0fe8
0fe8
"AK0000036228","500 HOLLYWOOD DR, ANCHORAGE, AK, 99501","Match","Exact","500 HOLLYWOOD DR, ANCHORAGE, AK, 99501","-149.87424,61.23034","190797469","R"
"AK0000363994","3155 E 18TH CIR, ANCHORAGE, AK, 99508","Match","Non_Exact","3155 E 18TH CIR, ANCHORAGE, AK, 99508","-149.82193,61.20462","190799569","L"
...
0060
28712","N 65 DEG 35 15 W 167 DEG 55 18, WALES, AK, 99734","No_Match"
"AK0000112227","KODIAK ARPR
...
0fe8
T AREA, KODIAK, AK, 99615","No_Match"
"AK0000033902","2130 E DIMOND BLVD, ANCHORAGE, AK, 99515","Match","Non_Exact","2130 W DIMOND BLVD, ANCHORAGE, AK, 99515","-149.91881,61.1375","190795925","L"
"AK0000562769","3100 TONGASS AVE, KETCHIKAN, AK, 99901-5746","No_Match"

Expected Result预期结果

I would expect headers to be stripped out by HttpClient library.我希望 HttpClient 库去除标头。

"AK0000036228","500 HOLLYWOOD DR, ANCHORAGE, AK, 99501","Match","Exact","500 HOLLYWOOD DR, ANCHORAGE, AK, 99501","-149.87424,61.23034","190797469","R"
"AK0000363994","3155 E 18TH CIR, ANCHORAGE, AK, 99508","Match","Non_Exact","3155 E 18TH CIR, ANCHORAGE, AK, 99508","-149.82193,61.20462","190799569","L"
"AK0000228718","1050 ASPEN ST, FAIRBANKS, AK, 99709-5501","Match","Exact","1050 ASPEN ST, FAIRBANKS, AK, 99709","-147.7731,64.8535","605310042","L"
"AK0000536714","SMITH COVE IN SMITH LAGOON T74S R86E CRM S17 & 20, KASAAN, AK, 99901","No_Match"
"AK0001413822","USS-12403, N BANK WOOD RIVER, ALEKNAGIK, AK, 99555","No_Match"
"AK0000489567","BREAKWATER BTWN WESTERN AVE & TAIT ST, METLAKATLA, AK, 99926","No_Match"

I ended up writing this extension method which performs sufficiently well for my use case.我最终编写了这个扩展方法,它在我的用例中表现得足够好。

    public static Task<Stream> ReadAsStreamAsync(this HttpContent content, bool isChunked)
    {
        if (!isChunked)
        {
            return content.ReadAsStreamAsync();
        }
        else
        {
            var task = content.ReadAsStreamAsync()
            .ContinueWith<Stream>((streamTask) =>
            {
                var outputStream = new MemoryStream();
                var buffer = new char[1024 * 1024];
                var stream = streamTask.Result;

                // No using() so that we don't dispose stream.
                var tr = new StreamReader(stream);
                var tw = new StreamWriter(outputStream);

                while (!tr.EndOfStream)
                {
                    var chunkSizeStr = tr.ReadLine().Trim();
                    var chunkSize = int.Parse(chunkSizeStr, System.Globalization.NumberStyles.HexNumber);

                    tr.ReadBlock(buffer, 0, chunkSize);
                    tw.Write(buffer, 0, chunkSize);
                    tr.ReadLine();
                }

                return outputStream;
            });

            return task;
        }


    }

If response is text only.如果响应仅为文本。 Httpclient auto handle it (only for string) Httpclient 自动处理它(仅适用于字符串)

string result = await response.Content.ReadAsStringAsync();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM