简体   繁体   中英

Is there anything wrong with requiring the client to specify the charset in the http content-type header field?

I'm implementing a service (as rest) that receives a POST method.

The encoding in my sistem is UTF-8.

I'm using jboss 5, in which the servlet that receives the requests follows the HTTP 1.1 specification of rfc2068 which states that:

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of " ISO-8859-1 " when received via HTTP.

so when the client that invokes my service is using for example UTF-8 and doesn't specify a charset, and the body of the POST contains characters outside the US-ASCII, the Jboss servlet assumes " ISO-8859-1 " and does a "wrong" decodification and in my system i receive "broken" characters. For example instead of the string "día" i receive "dÂa".

The approach i found for "protecting" my system is to require the client to specify the charset in the content-type parameter. If a charset is not specified then i respond with an http 403 and a text indicating that "the charset value must be specified".

Is there anything wrong with this approach?

RFC 2068 has been obsoleted twice and really is irrelevant. You need to look at RFC 7231, which doesn't define a default anymore. This means that the default is governed by the definition of the media type.

For text/plain, this implies US-ASCII (as far as I remember), so clients that want to send non-ASCII characters really need to specify the charset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM