简体   繁体   English

Python 3 utf8值解码为字符串

[英]Python 3 utf8 value decode to string

Hi i am using python3 and i want to change utf8 value to string (decode) 嗨,我正在使用python3,我想将utf8值更改为字符串(解码)

Here is my code now 这是我的代码

s1 = '\u54c7'
print(chr(ord(s1)))  # print 哇

It's fine if input is one char but how to change a string? 如果输入是一个char没问题,但是如何更改字符串呢?

s2 = '\u300c\u54c7\u54c8\u54c8!!\u300d'
print(chr(ord(s2)))   # Error! I want print "「哇哈哈!!」"

Thanks 谢谢

Edit: ================================================================ 编辑:================================================ ================

Hi all,i update the question 大家好,我更新了问题

If i got the string is "s3" like below and i use replace to change format 如果我得到的字符串是“ s3”,如下所示,我使用replace更改格式

but print "s3" not show "哇哈哈!!" 但打印“ s3”而不显示“哇哈哈!”

If i initiated s4 with \哇\哈\哈!!' 如果我使用\\ u54c7 \\ u54c8 \\ u54c8来启动s4!' and print s4 并打印s4

it's look like correct so how can i fix s3 ? 它看起来像是正确的,所以我该如何修复s3?

s3 = '哇哈哈!!'
s3 = s3.replace("&#x","\\u").replace(";","") # s3 = \u54c7\u54c8\u54c8!!
s4 = '\u54c7\u54c8\u54c8!!'
print(s3)  # \u54c7\u54c8\u54c8!!
print(s4)  # 哇哈哈!!

If you are in fact using python3, you don't need to do anything. 如果实际上使用的是python3,则无需执行任何操作。 You can just print the string. 您可以只打印字符串。 Also you can just copy and paste the literals into a python string and it will work. 您也可以将文字直接复制并粘贴到python字符串中,它将起作用。

'「哇哈哈!!」' == '\u300c\u54c7\u54c8\u54c8!!\u300d'

In regards to the updated question, the difference is escaping. 关于更新的问题,区别正在逃避。 If you type a string literal, some sequences of characters are changed to characters that can't be easily typed or be displayed. 如果键入字符串文字,则某些字符序列将更改为无法轻松键入或显示的字符。 The string is not stored as the series of characters you see but as a list of values created from characters like 'a', ';', and '\\300'. 该字符串不会存储为您看到的一系列字符,而是存储为由诸如“ a”,“;”和“ \\ 300”之类的字符创建的值的列表。 Note that all of those have a len of 1 because they are all one character. 请注意,所有这些字符的len均为1,因为它们都是一个字符。

To actually convert those values you could use eval , the answer provided by Iron Fist, or find a library that converts the string you have. 要实际转换这些值,可以使用eval ,由Iron Fist提供的答案,或者找到一个可以转换您拥有的字符串的库。 I would suggest the last since the rules surrounding such things can be complex and rarely are covered by simple replacements. 我建议最后一点,因为围绕此类事物的规则可能很复杂,很少被简单的替换所覆盖。 I don't recognize the particular pattern of escaping, so I cannot recommend anything, sorry. 我不了解特殊的转义模式,因此,我无法推荐任何东西,对不起。

Regarding your s3 string, this seems to me more like an HTML entity or text in HTML format, so use proper html.parser , this way: 关于您的s3字符串,在我看来,这更像是HTML实体或HTML格式的文本,因此请使用适当的html.parser ,这种方式:

>>> s3 = '哇哈哈!!'
>>> from html.parser import HTMLParser
>>> 
>>> p = HTMLParser()
>>> 
>>> p.unescape(s3)
'哇哈哈!!'

Or, more simply with html.unescape : 或者,更简单地使用html.unescape

>>> import html
>>> 
>>> html.unescape(s3)
'哇哈哈!!'

Quoting from Python docs on html.unescape : 引用html.unescape上的Python文档:

html.unescape(s)

Convert all named and numeric character references (eg >, >, &x3e;) in the string s to the corresponding unicode characters. 将字符串s中的所有命名和数字字符引用(例如>,>和&x3e;)转换为相应的unicode字符。
... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM