简体   繁体   English

在python中将字符串列表转换为unicode字符

[英]Convert a list of string to unicode characters in python

My python code using facebook API to request user's info. 我的python代码使用facebook API请求用户信息。 And the name can contain Unicode characters: 并且名称可以包含Unicode字符:

# -*- coding: utf-8 -*-
from facebook import Facebook

def desktop_app():
# Get api_key and secret_key from a file
    facebook = Facebook('x', 'xx')
    facebook.auth.createToken()
# Show login window
    facebook.login()
# Login to the window, then press enter
    print 'After logging in, press enter...'
    raw_input()
    facebook.auth.getSession()
    info = facebook.users.getInfo([facebook.uid], [u'name', 'birthday', 'affiliations', 'sex'])[0]
    for attr in info:
        print '%s: %s'.encode('ascii') % (attr, info[attr])
    friends = facebook.friends.get()
    friends = facebook.users.getInfo(friends[0:5], [u'name', 'birthday', 'relationship_status'])
    for friend in friends:
        if 'birthday' in friend:
            print friend['name'].encode('utf8'), 'has a birthday on', friend['birthday'], 'and is', friend['relationship_status']
        else:
            print friend['name'].encode('utf8'), 'has no birthday and is', friend['relationship_status']
    arefriends = facebook.friends.areFriends([friends[0]['uid']], [friends[1]['uid']])

if __name__ == "__main__":
    desktop_app()

I got this error when the Facebook name contain Unicode characters: 当Facebook名称包含Unicode字符时出现此错误:

File "C:\\Python27\\lib\\encodings\\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\ũ' in position 7: character maps to 文件“ C:\\ Python27 \\ lib \\ encodings \\ cp437.py”,第12行,在编码返回编解码器中。 7:字符映射到

Thanks in advance if u help me to fix that! 在此先感谢您帮助我解决问题! :) :)

The quick and dirty answer is to use somestring.encode('ascii', 'ignore') to handle unexpected characters. 快速而肮脏的答案是使用somestring.encode('ascii', 'ignore')处理意外字符。

I suspect you code has deeper problems though. 我怀疑您的代码有更深层次的问题。 If you're printing real unicode strings, you don't have to encode them first (otherwise, their meaning will be lost before print gets to them): 如果要打印真正的unicode字符串,则不必先对其进行编码(否则,在打印到它们之前,它们的含义将丢失):

>>> print u'ba\u0169er'     # no encode or decode is needed to print
baũer

Also, the line print '%s: %s'.encode('ascii') % (attr, info[attr]) is encoding the template before any string substitution has taken place. 此外,行print '%s: %s'.encode('ascii') % (attr, info[attr])正在对模板进行编码, 然后进行任何字符串替换。 That likely isn't what you intended. 那可能不是您想要的。

The problem is that your console doesn't support one or more of the characters you're receiving. 问题在于您的控制台不支持您收到的一个或多个字符。 You can execute chcp 65001 to make the console support UTF-8 (and as a side bonus, you don't have to encode manually), but this may have an adverse effect on other programs run from the same console. 您可以执行chcp 65001来使控制台支持UTF-8(作为附带的好处,您不必手动编码),但这可能会对从同一控制台运行的其他程序产生不利影响。

The easiest solution is to use an IDE that supports UTF-8, such as Pythonwin that comes with the pywin32 extensions. 最简单的解决方案是使用支持UTF-8的IDE,例如pywin32扩展随附的Pythonwin Leave your strings in Unicode and just print them and they will display properly on a UTF-8 terminal (as long as the font supports the characters, of course). 将您的字符串保留为Unicode,然后打印即可,它们将正确显示在UTF-8终端上(当然,只要字体支持字符即可)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM