简体   繁体   中英

Matching charset of HTTP Header Content-Type

In JavaScript, I want to get "charset" attribute of the HTTP header field name 'Content-Type'

The Regex I've seen thus far has been something like:

var charset = (/^charset=(.+)/im).exec(ContentType)[1];

With ContentType contain informations of Content-Type HTTP header.

But in my testing, the matched result is 'null'

Edit: follow response to @andris leduskrasts, I do this

var ctype = 'text/html; charset=utf-8';
var charset = new RegExp('charset=.*?(?=$|\s|\;|\")').exec(ctype);
system.stdout.writeLine(charset);

I get 'charset=utf-8'. But some idea to get only 'utf-8'. ?

If you're fine with the " charset= " part being a part of your result, this will do:

charset=.*?(?=\\s|\\;|\\|$")

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"> results in charset=ISO-8859-1 .

If you want to get rid of the " charset= " part already in the regex, it's a bit more tricky, as javascript doesn't support lookbehinds.

EDIT:

If you want only the UTF-8 part, it's easily doable IF your variable is always the content type and, hence, it ends with the actual charset. In this case: [^\\s\\;\\=]*?(?=$) ; which will really just select the last word of your string, after a space, a semicolon and a = . This is by no means a good solution for finding the charset in a random string, but it might do the trick for your particular case.

This Javascript library, do the job !

content-type : Create and parse HTTP Content-Type header according to RFC 7231

var contentType = require('content-type')
var obj = contentType.parse('image/svg+xml; charset=utf-8')

Parse a content type string. This will return an object with the following properties (examples are shown for the string ' image/svg+xml; charset=utf-8 '):

  • type : The media type (the type and subtype, always lower case). Example: 'image/svg+xml'
  • parameters : An object of the parameters in the media type (name of parameter always lower case). Example: {charset: 'utf-8'}

Throws a TypeError if the string is missing or invalid.

I just experienced the same problem.

If you need to extract just the charset value from an arbitrary content-type header (which permits characters after the charset assignment as per rfc1341 ) you can use the following JS regexp:

var re = /charset=([^()<>@,;:\"/[\]?.=\s]*)/i;

This works because the matched group starts after = and excludes the possible endings of the charset specification given in the link; namely ()<>@,;:\\"/[]?.= , spaces, and (implicitly) end-of-string.

Since the charset is optional, you can set an appropriate value with something like:

var charset = re.test(ctype) ? re.exec(ctype)[1] : 'utf8';

or some other default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM