简体   繁体   中英

How do you convert latin1 to utf8 character encoding?

So, I currently have this problem - I have a sql db dump and the character encoding in it is latin1, but there are some utf8 chars in the file that look like Ä (should be ā) Ä« (should be ī) Å¡ (should be š) Ä“ (should be ē) etc. How do I convert these leters back to the original utf8.?

Character in the file <-> what it should have been <-> bytes

Ä“ <-> ē <-> 5

Ä <-> ā <-> 2

Å¡ <-> š <-> 4

Ä« <-> ī <-> 4

If you're seeing multiple bytes for what should be single characters, chances are it's already in UTF-8. Bear in mind that ISO-8859-1 is a single-byte-per-character encoding, whereas UTF-8 can take multiple bytes - and any non-ASCII character does take multiple bytes.

I suggest you open the file in a UTF-8-aware text editor, and check it there.

Encoding should be set on the connection on which you import data and read out data. If both of them are set to UTF-8, you will face no problems.

If you however import them with a latin1 connection, and later on reading it out with a UTF-8, you're in a world of trouble.

PHP internally only handles latin1, however that isn't nessecarily a problem for you.

If you have already wrongly imported the data, you would see a lot of ? or (diamond + ?) on your output I think.

But basically, when connecting frmo PHP, make sure to invoke SET NAMES 'utf8' first thing you do and see if that works.

If data still is wrong, you could use PHPs functions utf8_encode / utf8_decode to convert the data that is problematic.

In a working scenario they should never be used though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM