简体   繁体   中英

Convert long hex to unicode character such as emoji, programmatically

Given a variable containing the hex value of an emoji character as str (eg, s = '1f602' ), how to programmatically print that into a file as a UTF-8 encoded emoji character?

This question doesn't do it programmatically, but requiring the code point itself to be included in the source code.

I know that this works in Python 3 only :

import codecs
s = '1f602'
with codecs.open('test.out', 'w', 'utf-8') as outfile:
    outfile.write('{}\n'.format(eval('u"{}{}"'.format(r'\U000', text))))

The file, when opened in a supported text editor, will show a single emoji character.

How to make this works also in Python 2, and without eval ?

I thought unichr would work, but it only accept unicode characters less than 0x10000 .

You could also go through UTF-32 encoding:

import struct

def fullchr(n):
    return struct.pack('<I', n).decode('utf-32le')

outfile.write(fullchr(0x1F602))   # int('1F602', 16)

Or from Python 3.3 onwards there is no longer such a thing as a narrow build, so you can just use chr(0x1F602) .

This works in both Python 2 and 3. It uses the safer ast.literal_eval to build the character, since as you found, unichr won't work for characters above U+FFFF on a narrow Python 2 build.

import ast
import io

s = '1f602'
s2 = "u'\\U{:08X}'".format(int(s,16))
c = ast.literal_eval(s2)
with io.open('test.txt','w',encoding='utf8') as f:
    f.write(c)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM