PHP parsing Wikipedia content, UTF8 hyphen

Question

I'm currently trying to parse content from de.wikipedia.org with PHP.

After reading the pages with file_get_contents(...) and converting the received content with utf8_decode(...) from UTF8 to ISO-8859-1 the main part is displayed and saved correctly. Only some special characters like the "long hyphen" ( – ) are not being converted and getting display as â€“ or ? .

This hyphen seems to have to unicode-id 150 - how may i display it on ISO-8859-1?

Example: http://de.wikipedia.org/wiki/23_%E2%80%93_Nichts_ist_so_wie_es_scheint

Answer 1

尝试使用iconv代替：

$iso = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $utf8);

PHP parsing Wikipedia content, UTF8 hyphen

Question

1 answers

solution1
3 2014-04-09 23:24:04

PHP parsing Wikipedia content, UTF8 hyphen

Question

1 answers

solution1 3 2014-04-09 23:24:04

solution1
3 2014-04-09 23:24:04