In JavaScript, I want to get "charset" attribute of the HTTP header field name 'Content-Type'
The Regex I've seen thus far has been something like:
var charset = (/^charset=(.+)/im).exec(ContentType)[1];
With ContentType contain informations of Content-Type HTTP header.
But in my testing, the matched result is 'null'
Edit: follow response to @andris leduskrasts, I do this
var ctype = 'text/html; charset=utf-8';
var charset = new RegExp('charset=.*?(?=$|\s|\;|\")').exec(ctype);
system.stdout.writeLine(charset);
I get 'charset=utf-8'. But some idea to get only 'utf-8'. ?
If you're fine with the " charset=
" part being a part of your result, this will do:
charset=.*?(?=\\s|\\;|\\|$")
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
results in charset=ISO-8859-1
.
If you want to get rid of the " charset=
" part already in the regex, it's a bit more tricky, as javascript doesn't support lookbehinds.
EDIT:
If you want only the UTF-8 part, it's easily doable IF your variable is always the content type and, hence, it ends with the actual charset. In this case: [^\\s\\;\\=]*?(?=$)
; which will really just select the last word of your string, after a space, a semicolon and a =
. This is by no means a good solution for finding the charset in a random string, but it might do the trick for your particular case.
This Javascript library, do the job !
content-type : Create and parse HTTP Content-Type header according to RFC 7231
var contentType = require('content-type')
var obj = contentType.parse('image/svg+xml; charset=utf-8')
Parse a content type string. This will return an object with the following properties (examples are shown for the string ' image/svg+xml; charset=utf-8
'):
type
: The media type (the type and subtype, always lower case). Example: 'image/svg+xml' parameters
: An object of the parameters in the media type (name of parameter always lower case). Example: {charset: 'utf-8'}
Throws a TypeError
if the string is missing or invalid.
I just experienced the same problem.
If you need to extract just the charset value from an arbitrary content-type header (which permits characters after the charset assignment as per rfc1341 ) you can use the following JS regexp:
var re = /charset=([^()<>@,;:\"/[\]?.=\s]*)/i;
This works because the matched group starts after =
and excludes the possible endings of the charset specification given in the link; namely ()<>@,;:\\"/[]?.=
, spaces, and (implicitly) end-of-string.
Since the charset is optional, you can set an appropriate value with something like:
var charset = re.test(ctype) ? re.exec(ctype)[1] : 'utf8';
or some other default.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.