简体   繁体   English

Twitter中的Unicode字符(Python)

[英]Unicode Characters in Twitter (Python)

I've learned how to send tweets with Python, but I'm wondering if it's possible to send emojis or other special Unicode characters in the tweets. 我已经学习了如何使用Python发送推文,但是我想知道是否可以在推文中发送表情符号或其他特殊的Unicode字符。

For example, when I try to tweet u'1F430', it simply shows up as "1F430" in the tweet. 例如,当我尝试鸣叫u'1F430'时,它只是在鸣叫中显示为“ 1F430”。

u'1F430' is the literal string "1F430". u'1F430'是文字字符串“ 1F430”。 What character are you trying to get? 您想获得什么角色? In general you can get literal bytes into a python string using "\\x20", eg 通常,您可以使用“ \\ x20”将文字字节转换为python字符串,例如

>>> print(b"#\x20#")
# #

The byte with hexadecimal value of 20 (decimal 32) in between 2 hashes. 在两个哈希之间具有十六进制值20(十进制32)的字节。 Bytes are decoded as ASCII by default, and ASCII char (hex) 20 is a space. 默认情况下,字节被解码为ASCII,ASCII字符(十六进制)20是一个空格。

>>> print(u"#\u0020#")
# #
>>> print(u"#\U0001F430#")
# #

Unicode codepoint 20 (a single space) in the middle of 2 hashes 2个哈希值中间的Unicode代码点20(单个空格)

See https://docs.python.org/3.3/howto/unicode.html for more info. 有关更多信息,请参见https://docs.python.org/3.3/howto/unicode.html NB It can get a little confusing since python will implicitly convert between bytes and unicode (using the ASCII encoding) in a lot of cases, which can hide the issue from you for a while. 注意:由于python在很多情况下会在字节和unicode之间进行隐式转换(使用ASCII编码),因此可能会造成一些混乱,这可能会使您隐瞒一段时间。

>>> len(u'1f430')
5
>>> len(u'\U0001F430') 
1 # the latter might be equal to two in Python 2 on a narrow build (Windows, OS X)

The former is 5 characters, the latter is a single character. 前者是5个字符,后者是单个字符。

If you want to specify the character in Python source code then you could use its name for readability: 如果要在Python源代码中指定字符,则可以使用其名称以提高可读性:

>>> print(u"\N{RABBIT FACE}")
🐰

Note: it might not work in Windows console. 注意:它可能无法在Windows控制台中使用。 To display non-BMP Unicode characters there, you could use win-unicode-console + ConEmu . 要在那里显示非BMP Unicode字符,可以使用win-unicode-console + ConEmu

If you are reading it from a file, network, etc then this character is no different from any other: to decode bytes into Unicode text, you should specify a character encoding eg: 如果要从文件,网络等中读取字符,则此字符与其他字符没有什么不同:要将字节解码为Unicode文本,应指定字符编码,例如:

import io

with io.open('filename', encoding='utf-8') as file:
    text = file.read()

Which specific encoding to use depends on the source eg, see A good way to get the charset/encoding of an HTTP response in Python 使用哪种特定的编码取决于源,例如,请参见在Python中获取HTTP响应的字符集/编码的好方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM