简体   繁体   中英

Regular Expression to remove emoticons in Python

I am trying to remove emoticons from a piece of text, I looked at this regex from another question and it doesn't remove any emoticons. Can you let me know what I am doing wrong, or if there are better regex's for removing emojis from a string.

import re
myre = re.compile(u'('
u'\ud83c[\udf00-\udfff]|'
u'\ud83d[\udc00-\ude4f\ude80-\udeff]|'
 u'[\u2600-\u26FF\u2700-\u27BF])+', 
re.UNICODE)

def clean(inputFile,outputFile):
    with open(inputFile, 'r') as original,open(outputFile, 'w+') as out:
        for line in original:
            line=myre.sub('', line)

Something like this?

import re
myre = re.compile('('
'\ud83c[\udf00-\udfff]|'
'\ud83d[\udc00-\ude4f\ude80-\udeff]|'
'[\u2600-\u26FF\u2700-\u27BF])+'.decode('unicode_escape'), 
re.UNICODE)

def clean(inputFile,outputFile):
    with open(inputFile, 'r') as original,open(outputFile, 'w+') as out:
        for line in original:
            line = myre.sub('', line.decode('utf-8'))
            print(line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM