Using python3.8 I would like to convert unicode notation to python notation:
s = 'U+00A0'
result = s.lower() # output 'u+00a0'
I want to replace u+
with \\u\u003c/code> :
result = s.lower().replace('u+','\u')
But I get the error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
How can I convert the notation
U+00A0
to \
?
EDIT:
The reason I wanted to get
\
is to further use encode
method to get b'\\xc2\\xa0'
.
My question: given a string in the following notation
U+00A0
I would like to convert it to byte code b'\\xc2\\xa0'
you are struggling with the representation of something versus its value...
import re
re.sub("u\+([0-9a-f]{4})",lambda m:chr(int(m.group(1),16)),s)
but for u+00a0 this becomes \\xa0
but same with the literal \
s = "\u00a0"
print(repr(s))
once you have the proper value as a unicode string you can then encode it to utf8
s = "\xa0"
print(s.encode('utf8'))
# b'\xc2\xa0'
so just final answer here
import re
s = "u+00a0"
s2 = re.sub("u\+([0-9a-f]{4})",lambda m:chr(int(m.group(1),16)),s)
s_bytes = s2.encode('utf8') # b'\xc2\xa0'
You can also use this:
>>> s = 'U+00A0'
>>> s = s.replace('U+', '\\u').encode().decode('unicode_escape').encode()
>>> s
b'\xc2\xa0'
You need to escape the \\
in replace
with a second \\
:
result = s.lower().replace('u+','\\u')
print(result)
will give you \
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.