简体   繁体   中英

Why are some servers not using CRLF after the last chunk length of zero?

I'm working with an HTTP request tool (similar to cURL) and having an issue with the server response. Either that or my understanding of the RFC for HTTP 1.1 and chunked data.

What I'm seeing is chunked data should be in this format:

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0\r\n
\r\n

what I'm actually seeing is the following:

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
e\r\n
 in\r\n\r\nchunks.\r\n
0

In other words, the few servers I've tested with send no more data after the 0.. not CRLF, much less CRLFCRLF.

How are we supposed to know it's the end of the chunked data without the proper format of the chunked tags? Timeouts happen looking for the CRLFs after the 0, and that's no sufficient.

Yes, it violates standard. But we want to be compatible with all possible http servers and clients, so we have to understand a way how it can be violated.

Chunked is used often in a way of content streaming over http 1.1 protocol. Standard ask to end content with additional CRLF . So we can see the following pseudo code:

def stream(endpoint)
  Socket.open(endpoint) do |socket|
    sleep 10

    more_data do |data|
      print data.length.to_s(16)
      print data
      print "CRLF"
    end
  end

  print "CRLF"
end

But the right code is the following:

def stream(endpoint)
  Socket.open(endpoint) do |socket|
    sleep 10

    more_data do |data|
      print data.length.to_s(16)
      print data
      print "CRLF"
    end
  end

ensure
  print "CRLF"
end

It means that after input socket interruption of any other exception wrong version of method won't be able to print additional "CRLF" to output socket.

How are we supposed to know it's the end of the chunked data without the proper format of the chunked tags? Timeouts happen looking for the CRLFs after the 0, and that's no sufficient.

Many implementations ignores this violation because they don't need to know the size of content. They just tries to receive as much data as possible before socket will be closed.

Use Content-Length, definitely whenever I know it; for file download, checking the filesize is insignificant in terms of resources. For chunked transfer we do not scan the message body for a CRLF pair. It first reads the specified number of bytes, and then reads two more bytes to confirm that they are CR and LF. If they're not, the message body is ill-formed, and either the size was specified improperly or the data was otherwise corrupted.

For more information read RCF , which says

A server using chunked transfer-coding in a response MUST NOT use the trailer for any header fields unless at least one of the following is true:

a)the request included a TE header field that indicates "trailers" is acceptable in the transfer-coding of the response, as described in section 14.39; or,

b)the server is the origin server for the response, the trailer fields consist entirely of optional metadata, and the recipient could use the message (in a manner acceptable to the origin server) without receiving this metadata. In other words, the origin server is willing to accept the possibility that the trailer fields might be silently discarded along the path to the client.

Way to Determine Message Body Length:

If header has Transfer-Encoding and the chunked transfer is final encoding, then message body length is determined by reading and decoding the chunked data until the transfer coding indicates the data is complete.

If header has Transfer-Encoding and the chunked transfer is not final encoding, then message body length is determined by reading the connection until it is closed by the server.

If header has Transfer-Encoding in request and the chunked transfer is not final encoding, then message body length cannot be determined reliably; the server MUST respond with the 400 (Bad Request) status code and then close the connection.

If a message is received with both a Transfer-Encoding and Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request response splitting and ought to be handled as an error. A sender MUST remove the received Content-Length field prior to forwarding such a message downstream.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM