简体   繁体   中英

Python saving string to file. Unicode error

I am extracting data from a Google spreadsheet using Spreadsheet API in Python. I can print every row of my spreadsheet on the commandline with a for loop but some of the text contain symbols eg celsius degree symbol(little circle). As I print these rows on the commandline I want to write them to a file. But I get different unicode errors when I do this. I tried solving it by doing it manually but there are too many:

current=current.replace(u'\xa0',u'')
current=current.replace(u'\u000a',u'p')
current=current.replace(u'\u201c',u'\"')
current=current.replace(u'\u201d',u'\"')
current=current.replace(u'\u2014',u'-')

what can I do so I won't get errors? eg

UnicodeEncodeError: 'ascii' codec can't encode character u'\\xa0' in position 1394: ordinal not in range(128)

current=current.replace(u'\u0446',u'u')

You want to decode it from whatever encoding it's in:

decoded_str = encoded_str.decode('utf-8')

For more information on how to deal with unicode strings, you should go over http://docs.python.org/howto/unicode.html

import unicodedata
decoded = unicodedata.normalize('NFKD', encoded).decode('UTF-8', 'ignore')

I'm not quite sure that the normalize is needed in this case. Also, that ignore option means that you might loose some information, because decoding errors will be ignored.

''.join(c for c in current if ord(c) < 128)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM