简体   繁体   中英

Replacing unencodable characters

Im currently working on something where i need to pull some .xml from a website and work with it.

Everything is working fine, but if i try to print the .xml (or text after parsing it) and there is some character in the .xml that cant be encoded, i get that error:

return codecs.charmap_encode(input,self.errors,encoding_table)
[0]UnicodeEncodeError: 'charmap' codec can't encode character '\u2665' in position 1161: character maps to <undefined>

Now i want to locate these characters and replace them with a "?" for example.

How do i do this?

Is there a better method for handling these errors?

If you wrote the code that generated that error it would be easier to help you, in any case, usually, you can encode the string in utf8 and then do the decoding:

data = '\u2665'
data = data.encode('utf8')
print(data)  # b'\xe2\x99\xa5'
data_d = data.decode('utf8')
print(data_d)  # ♥

Moreover you can add this line at the beginning of your script:

# -*- coding: utf-8 -*-

and then verify the stdout.encoding with:

import sys
print(sys.stdout.encoding)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM