简体   繁体   中英

Many emoji characters are not read by python file read

I have a list of 1500 emoji character dictionary in a json file, and I wanted to import those to my python code, I did a file read and convert it to a python dictionary but now I have only 143 records. How can I import all the emoji to my code, this is my code.

import sys
import ast

file = open('emojidescription.json','r').read()
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
emoji_dictionary = ast.literal_eval(file.translate(non_bmp_map))

#word = word.replaceAll(",", " ");

keys = list(emoji_dictionary["emojis"][0].keys())
values = list(emoji_dictionary["emojis"][0].values())

file_write = open('output.txt','a')

print(len(keys))
for i in range(len(keys)):
    try:
        content = 'word = word.replace("{0}", "{1}")'.format(keys[i],values[i][0])
    except Exception as e:
        content = 'word = word.replace("{0}", "{1}")'.format(keys[i],'')
    #file.write()
    #print(keys[i],values[i])
    print(content)


file_write.close()

This is my input sample

{

    "emojis": [
        {

            "👨‍🎓": ["Graduate"],
            "©": ["Copy right"],
            "®": ["Registered"],
            "👨‍👩‍👧": ["family"],
            "👩‍❤️‍💋‍👩": ["love"],
            "™": ["trademark"],
            "👨‍❤‍👨": ["love"], 
            "⌚": ["time"],
            "⌛": ["wait"], 
            "⭐": ["star"],
            "🐘": ["Elephant"],
            "🐕": ["Cat"],
            "🐜": ["ant"],
            "🐔": ["cock"],
            "🐓": ["cock"],

This is my result, and the 143 denotes number of emoji.

143

word = word.replace(" ‍ ‍ ‍ ", "family")

word = word.replace("Ⓜ", "")

word = word.replace("♥", "")

word = word.replace("♠", "")

word = word.replace("⌛", "wait")

I'm not sure why you're seeing only 143 records from an input of 1500 (your sample doesn't seem to display this behavior).

The setup doesn't seem to do anything useful, but what you're doing boils down to (simplified and skipping lots of details):

d = ..read json as python dict.
keys = d.keys()
values = d.values()
for i in range(len(keys)):
    key = keys[i]
    value = values[i]

and that should be completely correct. There are better ways to do this in Python, however, like using the zip function:

d = ..read json as python dict.
keys = d.keys()
values = d.values()
for key, value in zip(keys, values):  # zip picks pair-wise elements
    ...

or simply asking the dict for its items:

for key, value in d.items():
    ...

The json module makes reading and writing json much simpler (and safer), and using the idiom from above the problem reduces to this:

import json

emojis = json.load(open('emoji.json', 'rb'))

with open('output.py', 'wb') as fp:
    for k,v in emojis['emojis'][0].items():
        val = u'word = word.replace("{0}", "{1}")\n'.format(k, v[0] if v else "")
        fp.write(val.encode('u8'))

Why do you replace all emojis with 0xfffd in the lines:

non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
emoji_dictionary = ast.literal_eval(file.translate(non_bmp_map))

Just don't to this!

Using json:

import json

with open('emojidescription.json', encoding="utf8") as emojis:
    emojis = json.load(emojis)

with open('output.txt','a', encoding="utf8") as output:
    for emoji, text in emojis["emojis"][0].items():
        text = "" if not text else text[0]
        output.write('word = word.replace("{0}", "{1}")\n'.format(emoji, text))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM