简体   繁体   中英

PHP Tidy and Character Encoding

I am making use of PHP tidy like so:

$config = array(
                'wrap'                         => 0,
                'lower-literals'               => 1,
                'preserve-entities'            => 1,
                'drop-empty-paras'             => 0
                );

$tidy = new tidy;

$tidy->parseString($html, $config, 'utf8');

$tidy->cleanRepair();

When I pass in HTML with English text it comes out fine. However, French text, and it has trouble with the encoding. So if I pass something like vérifier then it appears as vérifier in the output. How can I get tidy to play nice with all languages, at least latin ones.

In addition, I will be passing the output of tidy through to PHP's DOM Document, anything I should be careful with here?

It looks very much like the UTF-8 handling is working fine, but you're interpreting the result in latin-1 instead of UTF-8. Set an appropriate HTTP header or meta tag instructing the browser to read the document using UTF-8.

header('Content-Type:text/html; charset=utf-8');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM