python将unicode代码值转换为字符串，不带'\\u\u0026#39;

Question

In the below code,在下面的代码中，

text = "\u54c8\u54c8\u54c8\u54c8"

Is there a way to convert the unicode code above to keeping the value only, and remove "\\u\u0026quot; from it.有没有办法将上面的 unicode 代码转换为仅保留值，并从中删除“\\u”。\u003c/b> So "\哈" becomes "54c8" instead.所以"\哈"变成了"54c8" 。

In javascript I can do text.charCodeAt(n).toString(16) , but I can't figure out the equivalent solution in python.在 javascript 中，我可以执行text.charCodeAt(n).toString(16) ，但我无法在 python 中找出等效的解决方案。

I tried to use regex to match it,我尝试使用正则表达式来匹配它，

pattern = re.compile('[\u0000-\uFFFF]')

matches = pattern.finditer(text)

for match in matches:
    print(match)

But all it did was printing out the character that the unicode value represent.但它所做的只是打印出 unicode 值代表的字符。

Answer 1

You can use a regular list comprehension to map over the 4 characters in text , and use ord to get the ord inal (integer) of the codepoint, then hex() to convert it to hexadecimal.您可以使用常规列表中理解到4个字符映射在以上text ，并使用ord得到ord码点的伊纳勒（整数），然后hex()将其转换为十六进制。 The [2:] slice is required to get rid of the 0x Python would otherwise add. [2:]切片需要摆脱 Python 否则会添加的0x 。

>>> text = "\u54c8\u54c8\u54c8\u54c8"
>>> text
'哈哈哈哈'
>>> [hex(ord(c))[2:] for c in text]
['54c8', '54c8', '54c8', '54c8']
>>>

You can then use eg "".join() if you need a single string.如果您需要单个字符串，则可以使用例如"".join() 。

(Another way to write the comprehension would be to use an f-string and the x hex format: （另一种编写理解式的方法是使用 f 字符串和x十六进制格式：

>>> [f'{ord(c):x}' for c in text]
['54c8', '54c8', '54c8', '54c8']

) )

If you actually have a string \哈\哈\哈\哈 , ie "backslash, u, five, four, c, eight" repeated 4 times, you'll need to first decode the backslash escape sequences to get the 4-codepoint string:如果您实际上有一个字符串\哈\哈\哈\哈 ，即“反斜杠，u，五，四，c，八”重复了 4 次，您需要首先解码反斜杠转义序列以获得 4 码点细绳：

>>> text = r"\u54c8\u54c8\u54c8\u54c8"
>>> codecs.decode(text, "unicode_escape")
'哈哈哈哈'

Answer 2

You can do that like this: You can ignore non-ASCII chars and encode to ASCII, or you can encode to UTF-8您可以这样做：您可以忽略非 ASCII 字符并编码为 ASCII，或者您可以编码为 UTF-8

text = "\u54c8\u54c8\u54c8\u54c8"
utf8string = text.encode("utf-8")
asciistring1 = text.encode("ascii", 'ignore')
asciistring2 = text.encode("ascii", 'replace')

You can refer to https://www.oreilly.com/library/view/python-cookbook/0596001673/ch03s18.html可以参考https://www.oreilly.com/library/view/python-cookbook/0596001673/ch03s18.html

python将unicode代码值转换为字符串，不带'\\u\u0026#39;

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-05-31 12:30:56

解决方案2
0 2021-05-31 12:52:45

python将unicode代码值转换为字符串，不带&#39;\\u\u0026#39;

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-05-31 12:30:56

解决方案2 0 2021-05-31 12:52:45

python将unicode代码值转换为字符串，不带'\\u\u0026#39;

解决方案1
1 已采纳 2021-05-31 12:30:56

解决方案2
0 2021-05-31 12:52:45