简体   繁体   中英

Replace special characters from list in python

How do I replace special characters (emoticons) with a given feature.

For example

emoticons = \
    [   ('__EMOT_SMILEY',   [':-)', ':)', '(:', '(-:', ] )  ,\
        ('__EMOT_LAUGH',        [':-D', ':D', 'X-D', 'XD', 'xD', ] )    ,\
        ('__EMOT_LOVE',     ['<3', ':\*', ] )   ,\
        ('__EMOT_WINK',     [';-)', ';)', ';-D', ';D', '(;', '(-;', ] ) ,\
        ('__EMOT_FROWN',        [':-(', ':(', ] )   ,\
        ('__EMOT_CRY',      [':,(', ':\'(', ':"(', ':(('] ) ,\
    ]

msg = 'I had a beautiful day :)'

output desired

>> I had a beautiful day __EMOT_SMILEY

I know how to do it with a dict, but here I have multiple values associated to each feature

The following code will not work in this case

for emote, replacement in emoticons.items():
  msg = msg.replace(emote, replacement)

You could use a dictionary and a regex :

import re

def replace(msg, emoticons):
    d = {r: emote for emote, replacement in emoticons for r in replacement}
    pattern = "|".join(map(re.escape, d))
    msg = re.sub(pattern, lambda match: d[match.group()], msg)
    return msg

print(replace(msg, emoticons))  # I had a beautiful day __EMOT_SMILEY

This oughta do it:

emoticons = [   ('__EMOT_SMILEY',   [':-)', ':)', '(:', '(-:', ] ),
        ('__EMOT_LAUGH',    [':-D', ':D', 'X-D', 'XD', 'xD', ] ),
        ('__EMOT_LOVE',     ['<3', ':\*', ] ),
        ('__EMOT_WINK',     [';-)', ';)', ';-D', ';D', '(;', '(-;', ] ),
        ('__EMOT_FROWN',        [':-(', ':(', '(:', '(-:', ] ),
        ('__EMOT_CRY',      [':,(', ':\'(', ':"(', ':(('] )
    ]

emoticons = dict(emoticons)    
emoticons = {v: k for k in emoticons for v in emoticons[k]}

msg = 'I had a beautiful day :)'

for item in emoticons:
    if item in msg:
        msg = msg.replace(item, emoticons[item])

So, you crate a dict, invert it and replace all the emoticons that exist in sentence.

Try this instead:

emoticons = [
    ('__EMOT_SMILEY', [':-)', ':)', '(:', '(-:',]),
    ('__EMOT_LAUGH',  [':-D', ':D', 'X-D', 'XD', 'xD',]),
    ('__EMOT_LOVE',   ['<3', ':\*',]),
    ('__EMOT_WINK',   [';-)', ';) ', ';-D', ';D', '(;', '(-;',]),
    ('__EMOT_FROWN',  [':-(', ':(', '(:', '(-:',]),
    ('__EMOT_CRY',    [':,(', ':\'(', ':"(', ':((',]),
]

msg = 'I had a beautiful day :)'

for key, replaceables in dict(emoticons).items():
  for replaceable in replaceables:
    msg = msg.replace(replaceable, key)

print(msg)
>>> I had a beautiful day __EMOT_SMILEY
emoticons = [   ('__EMOT_SMILEY',   [':-)', ':)', '(:', '(-:', ] )  ,
    ('__EMOT_LAUGH',        [':-D', ':D', 'X-D', 'XD', 'xD', ] )    ,
    ('__EMOT_LOVE',     ['<3', ':\*', ] )   ,
    ('__EMOT_WINK',     [';-)', ';)', ';-D', ';D', '(;', '(-;', ] ) ,
    ('__EMOT_FROWN',        [':-(', ':(', '(:', '(-:', ] )  ,
    ('__EMOT_CRY',      [':,(', ':\'(', ':"(', ':(('] ) ,
]


msg = 'I had a beautiful day :)'

for emote, replacement in emoticons:
     for symbol in replacement:
         msg = msg.replace(symbol,emote)

print(msg)

How about this:

emoticons = [('__EMOT_SMILEY',   [':-)', ':)', '(:', '(-:']),
             ('__EMOT_LAUGH',    [':-D', ':D', 'X-D', 'XD', 'xD']),
             ('__EMOT_LOVE',     ['<3', ':\*']),
             ('__EMOT_WINK',     [';-)', ';)', ';-D', ';D', '(;', '(-;']),
             ('__EMOT_FROWN',    [':-(', ':(', '(:', '(-:']),
             ('__EMOT_CRY',      [':,(', ':\'(', ':"(', ':(('])]

msg = 'I had a beautiful day :)'

grabs = set([x for _, y in emoticons for x in y[1]])

for word in [x for x in msg.split() if all(y in grabs for y in x)]:
    for emot_code, search_patterns in emoticons:
        if word in search_patterns:
            msg = msg.replace(word, emot_code)
print(msg)  # I had a beautiful day __EMOT_SMILEY

Instead of trying to find any of the emoticons in the msg to replace them, it first searches for substrings that might be emoticons and tries to replaces those only.

That said, it does fail for cases with punctuation right after or before the emoticons; eg, "I had a beautiful day :)."

So all in all.. "__EMOT_FROWN"

There are plenty of answers giving you exactly what you asked for, but sometimes I think exactly what you asked for isn't the best solution. Like tobias_k said, the cleanest solution is to map many keys to the same value, essentially "reversing" your dictionary:

emoticons = \
    [   ('__EMOT_SMILEY',   [':-)', ':)', '(:', '(-:', ] )  ,\
        ('__EMOT_LAUGH',        [':-D', ':D', 'X-D', 'XD', 'xD', ] )    ,\
        ('__EMOT_LOVE',     ['<3', ':\*', ] )   ,\
        ('__EMOT_WINK',     [';-)', ';)', ';-D', ';D', '(;', '(-;', ] ) ,\
        ('__EMOT_FROWN',        [':-(', ':(', '(:', '(-:', ] )  ,\
        ('__EMOT_CRY',      [':,(', ':\'(', ':"(', ':(('] ) ,\
    ]

emote_dict = {emote: name for name, vals in emoticons for emote in vals}

The above code reverses the dictionary, so now it can be used like this:

>>>print(emote_dict[':)'])
_EMOT_SMILY

You can try using a dict, This should work as long as you only have 2 or 3 chars in your emoticons and the person uses a space... Im sure you can make it more robust but this will work for now.

emoticons = {
    '__EMOT_SMILEY': {':-)', ':)', '(:', '(-:'},
    '__EMOT_LAUGH' : {':-D', ':D', 'X-D', 'XD', 'xD'},
    '__EMOT_LOVE' : {'<3', ':\*'},
    '__EMOT_WINK' :{';-)', ';)', ';-D', ';D', '(;', '(-;'},
    '__EMOT_FROWN' : {':-(', ':(', '(:', '(-:'},
    '__EMOT_CRY' : {':,(', ':\'(', ':"(', ':(('}
        }

msg = 'I had a beautiful day :,('
img = msg[-3]
if img[0]==' ':
    img = msg[-2:]
else:
    img = msg[-3:]

for k, v in emoticons.items():
    if img in v:
        print(msg[:-3], k)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM