POST data will always be transmitted to the server using UTF-8 charset

Question

POST data will always be transmitted to the server using UTF-8 charset. - jQuery.ajax docs

Does this happen only when you use POST with jQuery (jQuery.post) or using <form method=post> ?

Answer 1

Content encoding of a form post is consistent with the declared document encoding. For example:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

When a form is submitted, the default encoding is form-url-encoded.

An XHR request can override the setting:

xhr.setRequestHeader('Content-Type','text/xml; charset=utf-8;');

So to answer your question, it is possible to have a form submission in iso-8859-1, and yet the Ajax call uses utf-8. Please avoid such situation, or your server side code will get messy.

Answer 2

Short answer

In HTML, the charset can be set by setting the accept-charset attribute on the form. If not set or the given values are not valid, the user agent defaults to the document's charset then to UTF-8. jQuery does it via setting the Content-Type header on the request.

Longer answer w/ sources

All HTML forms may have the attribute accept-charset which, as the HTML spec says:

The accept-charset attribute gives the character encodings that are to be used for the submission. If specified, the value must be an ordered set of unique space-separated tokens that are ASCII case-insensitive, and each token must be an ASCII case-insensitive match for one of the labels of an ASCII-compatible character encoding.

Source , two paragraphs below the big blue thingy.

You may also be interested in §4.10.22.5 Selecting a form submission encoding , emphasis mine:

If the user agent is to pick an encoding for a form, optionally with an allow non-ASCII-compatible encodings flag set, it must run the following substeps:

Let input be the value of the form element's accept-charset attribute.

Let candidate encoding labels be the result of splitting input on spaces.

Let candidate encodings be an empty list of character encodings.

For each token in candidate encoding labels in turn (in the order in which they were found in input), get an encoding for the token and, if this does not result in failure, append the encoding to candidate encodings.

If the allow non-ASCII-compatible encodings flag is not set, remove any encodings that are not ASCII-compatible character encodings from candidate encodings.

If candidate encodings is empty, return UTF-8 and abort these steps.

Each character encoding in candidate encodings can represent a finite number of characters. (For example, UTF-8 can represent all 1.1 million or so Unicode code points, while Windows-1252 can only represent 256.)
For each encoding in candidate encodings, determine how many of the characters in the names and values of the entries in the form data set the encoding can represent (without ignoring duplicates). Let max be the highest such count. (For UTF-8, max would equal the number of characters in the names and values of the entries in the form data set.)
Return the first encoding in candidate encodings that can encode max characters in the names and values of the entries in the form data set.

Source

POST data will always be transmitted to the server using UTF-8 charset

Question

2 answers

solution1
2 2014-03-05 04:47:06

solution2
1 2014-03-05 04:59:11

Short answer

Longer answer w/ sources

POST data will always be transmitted to the server using UTF-8 charset

Question

2 answers

solution1 2 2014-03-05 04:47:06

solution2 1 2014-03-05 04:59:11

Short answer

Longer answer w/ sources

solution1
2 2014-03-05 04:47:06

solution2
1 2014-03-05 04:59:11