简体   繁体   中英

PHP parsing Wikipedia content, UTF8 hyphen

I'm currently trying to parse content from de.wikipedia.org with PHP.

After reading the pages with file_get_contents(...) and converting the received content with utf8_decode(...) from UTF8 to ISO-8859-1 the main part is displayed and saved correctly. Only some special characters like the "long hyphen" ( ) are not being converted and getting display as – or ? .

This hyphen seems to have to unicode-id 150 - how may i display it on ISO-8859-1?

Example: http://de.wikipedia.org/wiki/23_%E2%80%93_Nichts_ist_so_wie_es_scheint

尝试使用iconv代替:

$iso = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $utf8);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM