简体   繁体   中英

Handling Unicode in python 2.7 when saving string to a file

Dealing with Unicode is my only challenge programming with Python, I had many problems in my past project and I always brute forced my way out testing different encoding till something works (if there is any tutorial for beginners it will be very handy).

For example I have this code:

# -*- coding: utf-8 -*-
string = "Åland Islands"
with open("1.txt","w")as f:
    f.write(string.decode("utf-8"))

Returning:

  return codecs.utf_8_decode(input, errors, True) 

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc5 in position 0: invalid continuation byte

I tested many encoding to solve this with no luck.

The coding line just tells the Python interpreter how it should interpret the bytes. That doesn't mean the script actually contains UTF-8-encoded text. In fact, the error message suggests that the file was saved as ISO-8859-encoded (Latin-1) text. 0xc5 is the Latin-1 encoding for Å; 0xc3 0x85 is the UTF-8 encoding.

You need to make sure your editor actually saves the file as UTF-8 encoded text, so that the coding line isn't lying to the interpreter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM