简体   繁体   中英

Python Unicode characters

I know the subject is not new, but I tried a lot of solutions, without success. I am using Python 2.7 (very not experimented user). My problem : I read a file :

my_file=open("file")

and then save the one line (which contains the word "pitié" into a variable, then I print it

line=my_file.readline()
print line
>> pitié

there, I got "pitié" as result. But if I want to manipulate it, I see that my variable (string) contains some bytes :

line
>> 'piti\xc3\xa9'

My problem is when I need to do some operation to manipulate this string, I need to have the "é" character. For example to put it in a Flask template. I tried some encode/decode operation, but I'm very confused. I get the usual

UnicodeDecodeError: 'ascii' codec can't decode byte 0x.. in position .: ordinal not in range(...)

What does the print function to give the right output ?

Thanks !

Welcome to the world of Unicode! Your file is saved in UTF-8, a multibyte encoding, so characters outside the ASCII range of 0-127 require two or more bytes. Read the file using the codecs or io module, and declare the encoding so it is read as a Unicode string, and non-ASCII codepoints up to 65535 will be a single codepoint. Switch to Python 3.3+ and all Unicode codepoints will be a single codepoint.

Note the first line of the example below declares the encoding of the source file . It does not have to match the encoding of the data file, but is used so Python knows the encoding of the literal Unicode string u'é' in the source.

#coding: utf8
import io

with io.open('file',encoding='utf8') as my_file:
    line = my_file.readline()
print line
print repr(line)
print line.index(u'é')

Output:

pitié
u'piti\xe9'
4

You're seeing two different display methods: print shows you the pretty version, and just typing line gives you the raw "repr" version. Nothing is wrong with the string. If you write it to a file, it will be just as it was in your original input file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM