简体   繁体   English

如何将“ \\ u5c0f \\ u738b \\ u5b50 \\ u003a \\ u6c49 \\ u6cd5 \\ u82f1 \\ u5bf9 \\ u7167”这样的字符串转换为汉字

[英]How can I convert strings like “\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167” to Chinese characters

I am now working on a small tool to request and decode a webpage, on which the Chinese characters are stored as string like 我现在正在使用一个小的工具来请求和解码网页,在该网页上汉字存储为字符串,例如

\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167 

in the source code, something of unicode. 在源代码中,有些是unicode。 I want to convert it to Chinese characters. 我想将其转换为汉字。

I can make it through this website http://rishida.net/tools/conversion/ . 我可以通过此网站http://rishida.net/tools/conversion/进行操作 But How can I make it using python? 但是,如何使用python做到这一点?

Those are Unicode codepoints already . 这些已经是Unicode代码点 They represent Chinese characters, but using escape codes that are easier on the developer: 它们代表中文字符,但使用的转义码对开发人员更容易:

>>> print u'\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167'
小王子:汉法英对照

You do not have to do anything to convert those; 您无需做任何转换。 the \\uxxxx escape form is simply another way to express the same codepoint. \\uxxxx转义形式只是表示相同代码点的另一种方式。 See String Literals : 参见字符串文字

\\uxxxx
Character with 16-bit hex value xxxx (Unicode only) 具有16位十六进制值xxxx的字符(仅Unicode)
\\Uxxxxxxxx
Character with 32-bit hex value xxxxxxxx (Unicode only) 具有32位十六进制值xxxxxxxx的字符(仅Unicode)

Python interprets those escape codes when reading the source code to construct the unicode value. 当读取源代码以构造unicode值时,Python会解释这些转义代码。

If the source of the data is not from Python source code but from the web, you have JSON data instead, which uses the same escape format: 如果数据源不是来自Python源代码,而是来自Web,则您将拥有JSON数据,该数据使用相同的转义格式:

>>> import json
>>> print json.loads('"\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167"')
小王子:汉法英对照

Note that the value then needs to be part of a larger string, one that at least includes quotes to mark this a string. 请注意,该值必须是较大字符串的一部分,该字符串至少应包含引号以将其标记为字符串。

Also note that the JSON string escape format differs from Python's when it comes to non-BMP (supplementary) codepoints; 还要注意,在涉及非BMP(补充)代码点时,JSON字符串转义格式与Python不同。 JSON treats those like UTF-16 does, by creating a surrogate pair and use two \\uxxxx sequences for such a codepoint. JSON通过创建一个代理对并为这样的代码点使用两个\\uxxxx序列,像对待UTF-16一样对待它们。 In Python you'd use a \\Uhhhhhhhh 32-bit hex value. 在Python中,您可以使用\\Uhhhhhhhh 32位十六进制值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Python中将u'\\\\ u4f60 \\\\ u4f60'这样的unicode字符串转换为u'\\ u4f60 \\ u4f60'? - How to convert unicode string like u'\\u4f60\\u4f60' to u'\u4f60\u4f60' in Python? 如何在python中更正一个Unicode字符串,如“ \\ u8bf8 \\ u845b \\ u4eae”? - How to correct one unicode string like “\u8bf8\u845b\u4eae” in python? Python Unicode 字符串在文件中存储为 '\蒸\汽\地',如何将其转换回 Unicode? - Python Unicode string stored as '\u84b8\u6c7d\u5730' in file, how to convert it back to Unicode? 如何将'\\ u5f71 \\ u89c6 \\ ...'转换成其真实含义?(python) - How to convert a '\u5f71\u89c6\…' in to its real meaning?(python) python FPDF unicode 符号 u"\☑" 或 u'\\U0001F5F9' - python FPDF unicode symbols u"\u2611" or u'\U0001F5F9' 如何使BeautifulSoup解析诸如“ \\ u003C”之类的编码字符? - How to get BeautifulSoup to parse encoded characters such as “\u003C”? 数据下载时如何将\\ u041b \\ u044e \\ u0431 \\ u0438之类的文本转换为普通文本? - How to convert text like \u041b\u044e\u0431\u0438 to normal text while data download? 我可以覆盖 Python 2 中的 u 字符串(u'example')吗? - Can I override u-strings (u'example') in Python 2? 打印带有UTF-8编码字符的字符串,例如:“ \\ u00c5 \\ u009b \\” - Printing strings with UTF-8 encoded characters, e.g.: “\u00c5\u009b\” python中的“diff -u -B -w”? - “diff -u -B -w” in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM