Python UTF-8 conversion

Question

I would like to ask how do the following conversion (source->target) by Python program.

>>> source = '\\x{4e8b}\\x{696d}'
>>> print source
\x{4e8b}\x{696d}
>>> print type(source)
<type 'str'>
>>> target = u'\u4e8b\u696d'
>>> print target.encode('utf-8')
事業

Thank you.

Answer 1

You can use int and unichr to convert them:

>>> int('4e8b', 16)
    20107
>>> unichr(int('4e8b', 16))
    u'\u4e8b'
>>> print unichr(int('4e8b', 16))
事

Answer 2

Taking advantage of Blender's idea, you could use re.sub with a callable replacement argument:

import re
def touni(match):
    return unichr(int(match.group(1), 16))

source = '\\x{4e8b}\\x{696d}'
print(re.sub(r'\\x\{([\da-f]+)\}', touni, source))

yields

事業

Answer 3

import re
p = re.compile(r'[\W\\x]+')
print ''.join([unichr(int(y, 16)) for y in p.split(source) if y != ''])
事業

also stole idea from @Blender...

Python UTF-8 conversion

Question

3 answers

solution1
4 2013-03-09 04:45:13

solution2
4 ACCPTED 2013-03-09 04:49:34

solution3
0 2013-03-09 04:52:51

Python UTF-8 conversion

Question

3 answers

solution1 4 2013-03-09 04:45:13

solution2 4 ACCPTED 2013-03-09 04:49:34

solution3 0 2013-03-09 04:52:51

solution1
4 2013-03-09 04:45:13

solution2
4 ACCPTED 2013-03-09 04:49:34

solution3
0 2013-03-09 04:52:51