简体   繁体   中英

Replacing repeated consecutive characters in Python

I need to make a function that replaces repeated, consecutive characters with a single character, for example:

 'hiiii how are you??' -> 'hi how are you?'
 'aahhhhhhhhhh whyyyyyy' -> 'ah why'
 'foo' -> 'fo'
 'oook. thesse aree enoughh examplles.' -> 'ok. these are enough examples'

You can try a regular expression like (.)\\1+ , ie "something, then more of the same something", and replace it with \\1 , ie "that first something".

>>> import re
>>> re.sub(r"(.)\1+", r"\1", 'aahhhhhhhhhh whyyyyyy')
'ah why'
>>> re.sub(r"(.)\1+", r"\1", 'oook. thesse aree enoughh examplles.')
'ok. these are enough examples.'

Make it a function with functools.partial (or any other way you like)

>>> import functools
>>> dedup = functools.partial(re.sub, r"(.)\1+", r"\1")
>>> dedup('oook. thesse aree enoughh examplles.')
'ok. these are enough examples.'

A solution can be expressed very compactly using itertools.groupby :

>>> import itertools
>>> ''.join(g[0] for g in itertools.groupby('hiiii how are you??'))
'hi how are you?'

itertools.groupby groups the objects in an iterable by the given key function. Groups are accumulated as long as the keys are equivalent. If no key function is given, the identity of the items are used, in this case the characters.

Once you have them grouped by their identity, you can then join the objects into a single string. The grouped objects are returned as tuples containing the object and an internal itertools._grouper object, which for your purposes, you can ignore and extract the character.

This can be turned into a function as follows:

def remove_repeated_characters(s):
    groups = itertools.groupby(s)
    cleaned = ''.join(g[0] for g in groups)
    return cleaned

This results in the expected values:

>>> [remove_repeated_characters(s) 
     for s in ['hiiii how are you??','aahhhhhhhhhh whyyyyyy',
               'foo', 'oook. thesse aree enoughh examplles.']]
['hi how are you?', 'ah why', 'fo', 'ok. these are enough examples.']
def dup_char_remover(input):
    output=""
    t=""
    for c in input:
        if t!=c:
            output = output + c
        t=c
    return output

input = "hiiii how arrrre youuu"
output=dup_char_remover(input)
print(output)

hi how are you

Using a simple iteration.

Demo:

def cleanText(val):
    result = []
    for i in val:
        if not result:
            result.append(i)
        else:
            if result[-1] != i:
                result.append(i)
    return "".join(result)

s = ['hiiii how are you??', 'aahhhhhhhhhh whyyyyyy', 'foo', 'oook. thesse aree enoughh examplles.']
for i in s:
    print(cleanText(i))

Output:

hi how are you?
ah why
fo
ok. these are enough examples.
from collections import OrderedDict

def removeDupWord(word):
   return "".join(OrderedDict.fromkeys(word))

def removeDupSentence(sentence):
    words = sentence.split()
    result = ''
    return ''.join([result + removeDupWord(word) + ' ' for word in words])


sentence = 'hiiii how are you??'
print (removeDupSentence(sentence))

>>> hi how are you? 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM