简体   繁体   中英

Encoding (UTF-8) issue

I want to write to text from a list. But encoding not working & writing like bits.

with open('freq.txt', 'w') as f:
    for item in freq:
        f.write("%s\n" % item.encode("utf-8"))

Output:

b'okul'
b'y\xc4\xb1l\xc4\xb1'

Expected:

okul
yılı

If you are using Python3, you can declare your desired encoding in the call to open :

with open('freq.txt', 'w', encoding='utf-8') as f:
    for item in freq:
        f.write("%s\n" % item)

If you don't provide an encoding it will default to the encoding returned by locale.getpreferredencoding() .

The problem with your code is that '%s\\n' % item.encode('utf-8') encodes item as bytes but then the string formatting operation implicitly calls str on the bytes, which results in the bytes' repr being used to construct the string.

>>> s = 'yılı'
>>> bs = s.encode('utf-8')
>>> bs
b'y\xc4\xb1l\xc4\xb1'
>>> # See how the "b" is *inside* the string.
>>> '%s' % bs
"b'y\\xc4\\xb1l\\xc4\\xb1'"

Making the format string a bytes literal avoids this problem

>>> b'%s' % bs
b'y\xc4\xb1l\xc4\xb1'

but then writing to the file would fail because you cannot write bytes to a file opened in text mode. If you really want to encode manually you would have to do this:

# Open the file in binary mode.
with open('freq.txt', 'wb') as f:
    for item in freq:
        # Encode the entire string before writing to the file.
        f.write(("%s\n" % item).encode('utf-8'))
import codecs

with codecs.open("lol", "w", "utf-8") as file:
    file.write('Okul')
    file.write('yılı')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM