[英]Tried to parse chunked transfer encoding,it's not working though, the file which I decoded is totally unreadable

I tried to parse the data which was generated by chunked transfer encoding in a Rest API,I did see the data has value when I tried to print the value in a string and I thought it should be working,but when I tried to assign the value to the file, the file is totally unreadable, the code below I used boost library and I gonna elaborate my thoughts in the code, we gonna get started from the response portion of my code, I have no idea what wrong I have done我试图解析由 Rest API 中的分块传输编码生成的数据,当我尝试在字符串中打印值时确实看到数据具有值,我认为它应该可以工作,但是当我尝试分配文件的价值,文件完全不可读,下面的代码我使用了boost库,我将在代码中阐述我的想法,我们将从我的代码的响应部分开始,我不知道我做错了什么

   // Send the request.
    boost::asio::write(socket, request);

    // Read the response status line. The response streambuf will automatically
    // grow to accommodate the entire line. The growth may be limited by passing
    // a maximum size to the streambuf constructor.
    boost::asio::streambuf response;
    boost::asio::read_until(socket, response, "\r\n");

    // Check that response is OK.
    std::istream response_stream(&response);
    std::string http_version;
    response_stream >> http_version;
    unsigned int status_code;
    response_stream >> status_code;
    std::string status_message;
    std::getline(response_stream, status_message);
    if (!response_stream || http_version.substr(0, 5) != "HTTP/")
        //std::cout << "Invalid response\n";
        return 9002;
    if (status_code != 200)
        //std::cout << "Response returned with status code " << status_code << "\n";
        return 9003;
    // Read the response headers, which are terminated by a blank line.
    boost::asio::read_until(socket, response, "\r\n\r\n");

    // Process the response headers.
    //this portion of code I tried to parse the file name in the header of response which the file name is in the  content-disposition of header
    std::string header;
    std::string fullHeader = "";
    string zipfilename="", txtfilename="";
    bool foundfilename = false;
    while (std::getline(response_stream, header) && header != "\r")
        std::transform(header.begin(), header.end(), header.begin(),
            [](unsigned char c){ return std::tolower(c); });
        string containstr = "content-disposition";
        string containstr2 = "filename";
        string quotestr = "\"";
        if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
            int countquotes = 0;
            bool foundquote = true;
            std::size_t startpos = 0, beginpos, endpos;
            while (foundquote)
                std::size_t myfound = header.find(quotestr, startpos);
                if (myfound != std::string::npos)
                    if (countquotes % 2 == 0)
                        beginpos = myfound;
                        endpos = myfound;
                        foundfilename = true;

                    startpos = myfound + 1;
                   foundquote = false;


            if (endpos > beginpos && foundfilename)
                size_t zipfileleng = endpos - beginpos;
                zipfilename = header.substr(beginpos+1, zipfileleng-1);
                txtfilename = header.substr(beginpos+1, zipfileleng-5);
                return 9004;


    if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
        return 9005;

     //when the zipfilename has been found, we gonna get the data from the body of response, due to the response was  chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector

      // Write whatever content we already have to output.
    std::string fullResponse = "";
    if (response.size() > 0)
        std::stringstream ss;
        ss << &response;
        fullResponse = ss.str();
    //tried split the entire body of response into a vector<string>

     vector<string> allresponsedata;
    split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
    //tried to merge the data of response
    string zipfiledata;
    int myindex = 0;
    for (auto &x : allresponsedata) {
        std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x

        if (myindex % 2 != 0)
            zipfiledata = zipfiledata + x;//tried to accumulate the datas

    //tried to write the data into a file
    std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
    zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());

    //afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though

I even tried something else ways like this slow http client based on boost::asio - (Chunked Transfer) ,but this way is not working as well,VS says我什至尝试了其他方法,例如基于 boost::asio - (Chunked Transfer) 的慢速 http 客户端,但这种方式效果不佳,VS 说

  1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
        argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)    

it just NOT able to compile in the line which is它只是无法在以下行中编译

size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);

even I have read the example of asio::transfer_exactly, there's no exactly example like this though https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html即使我已经阅读了 asio::transfer_exactly 的示例,尽管https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html没有这样的例子

any idea?任何想法?

I don't see you read the format correctly: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format我看您没有正确阅读格式: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format

You need to read the chunk length (in hex) and any optional chunk extensions before accumulating the full response body.在累积完整响应正文之前,您需要读取块长度(十六进制)和任何可选的块扩展。

It needs to be done before, because the sequence \r\n that you split on can easily appear inside the chunk data.它需要在之前完成,因为您拆分的序列\r\n很容易出现在块数据中。

Again, I recommend to just use Beast's support, making it all a simple同样,我建议只使用 Beast 的支持,让一切变得简单

 http::response<http::string_body> response;
 boost::asio::streambuf buf;
 http::read(socket, buf, response);

And you will have the headers fully parsed, interpreted (including Trailer headers!) and the content in response.body() as a std::string .并且您将完全解析、解释标头(包括Trailer标头!)并将response.body()中的内容作为std::string

It will do the right thing even if the server doesn't use chunked encoding or combines with different encoding options.即使服务器不使用分块编码或结合不同的编码选项,它也会做正确的事情。

There's simply no reason to reinvent the wheel.根本没有理由重新发明轮子。

Full Demo完整演示

This demonstrates with the Chunked Encoding test url from https://jigsaw.w3.org/HTTP/ :这通过来自https://jigsaw.w3.org/HTTP/的块编码测试 url 演示:

#include <boost/process.hpp>
#include <boost/beast.hpp>
#include <iostream>
namespace http = boost::beast::http;
using boost::asio::ip::tcp;

int main() {
    http::response<http::string_body> response;

    boost::asio::io_context ctx;
    tcp::socket socket(ctx);

    connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http"));

                http::verb::get, "/HTTP/ChunkedScript", 11));

    boost::asio::streambuf buf;
    http::read(socket, buf, response);

    std::cout << response.body() << "\n";
    std::cout << "Effective headers are:" << response.base() << "\n";


This output will be chunked encoded by the server, if your client is HTTP/1.1
Below this line, is 1000 repeated lines of 0-9.
...996 lines removed ...

Effective headers are:HTTP/1.1 200 OK
cache-control: max-age=0
date: Wed, 31 Mar 2021 20:09:50 GMT
transfer-encoding: chunked
content-type: text/plain
etag: "1j3k6u8:tikt981g"
expires: Wed, 31 Mar 2021 20:09:49 GMT
last-modified: Mon, 18 Mar 2002 14:28:02 GMT
server: Jigsaw/2.3.0-beta3

