简体   繁体   中英

UTF-8 and HTML entities

I try to eject text from Word.DOC file with PHP. All seems ok, but the only trouble is something like

СУДОВА БУХГАЛТЕРІЯ

instead of russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?

html_entity_decode should work with the proper parameters (unless you're using PHP 5.3.3 or later):

html_entity_decode($str, ENT_QUOTES, 'UTF-8')

This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter's default value was ISO-8859-1 . In that case the cyrillic characters can't be converted as the ISO 8859-1 character set doesn't contain them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM