简体   繁体   中英

How to fix encoding while reading a CSV file?

I'm reading a CSV file in php and so far as i have understood - these kind of files can have any encoding that was ever invented by hoomans omg and so on... i guess i have a MacRoman ANSI encoded CSV, i'm working on a Mac.

So far, so good (not good at all but thats another topic).. Now, while iterating through the lines, i have a value like:

Z�rich

Obviously, it should be "Zürich" - the "ü" is missing..

Now, i have tried almost anything.. mb_detect_encoding is saying "false" so, he doesn't understand what it is...

Then i have found a genius class by Sebastian Grignoli here -> Detect encoding and make everything UTF-8

Seems nice but... all i got is:

ZŸrich

not really the "ü" i have expected :D

Now i have found out, that a "utf8_encode" will work somehow, it generates:

Z\u009Frich

but.. what now? if i put this directly in the database, the final value is "Zrich", which means it is still not really UTF-8, or is the db just struggling with the escaped variant? When i make an mb_detect_encoding on that value, he says now "UTF-8".. nice.. but how can i go further? How can i get my "Zürich" the right way in UTF-8?

You can probably use iconv for the conversion. On my installation, the MacRoman encoding is called simply "MAC" :

$city = "Z\x9frich";
$city = iconv("MAC", "UTF-8", $city); 
echo $city; // Output: Zürich

Try to convert all the file first with iconv. And import later. Or iterate every line and convert with iconv.

You must know the original codification of your file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM