简体繁体中英

charset detection, meta vs header

原文 2014-05-29 10:31:14 8 1 php/ html/ utf-8/ character-encoding/ http-headers

We recently ran into some troubles when trying to determine the correct encoding used for a page. We have encounter a page with following setup:

header response:

Content-Type:text/html; charset=GBK

meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Actual content is in GBK, modern browsers are smart enough to use the right encoding for this page.

But for a crawler (using curl), we are forced to decide picking one charset value over the other. So my question is: Is taking header charset over meta charset the normal thing to do ?

(Most content-based encoding detection algorithm we have tried are shaky at best, as long as one charset is more reliable than the other, we prefer using specified charset over anything from our own encoding detection.)

1 answers

Is taking header charset over meta charset the normal thing to do?

Yes. See the specification .

HTTP headers are checked at step 4. Meta isn't examined until step 5 (if it appears soon enough in the file) or step 9 (otherwise).

Charset detection in PHP

Facebook charset detection mechanism?

Pound sign is displayed as '｣' symbol in sent email, have header meta tag as charset=utf-8

Charset problem, MySQL and get_meta_tags()

Apache ignores meta charset on HTML display in browser

Don't set charset in header

php sends charset header but there is no header command

header location and meta tags

ext/mysql charset support vs ext/mysqli charset

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Charset detection in PHP Facebook charset detection mechanism? Pound sign is displayed as '｣' symbol in sent email, have header meta tag as charset=utf-8 Charset problem, MySQL and get_meta_tags() <meta charset='ISO-8859-9'> is not working properly Apache ignores meta charset on HTML display in browser Don't set charset in header php sends charset header but there is no header command header location and meta tags ext/mysql charset support vs ext/mysqli charset

Related Tags

charset detection, meta vs header

Question

1 answers

solution1 2 ACCPTED 2014-05-29 10:35:53

solution1
2 ACCPTED 2014-05-29 10:35:53