lets say I have this string: Sayy Hellooooooo
if N = 2
I want the result to be (Using Regex): Sayy Helloo
Thank U in advance
You could build the regex dynamically for a given n
, and then call sub
without callback:
import re
n = 2
regex = re.compile(rf"((.)\2{{{n-1}}})\2+")
s = "Sayy Hellooooooo"
print(regex.sub(r"\1", s)) # Sayy Helloo
{{
: this double brace represents a literal brace in an f-string {n-1}
injects the value of n-1
, so together with the additional (double) brace-wrap, this {{{n-1}}}
produces {2}
when n
is 3. \2+
captures more subsequent occurrences of that same character, so these are the characters that need removal.\1
thus reproduces the allowed repetition, but omits the additional repetition of that same character.Another option is to use re.sub
with a callback:
N = 2
result = re.sub(r'(.)\1+', lambda m: m.group(0)[:N], your_string)
You could use backreferences to mach the previous character. So (a|b)\1
would match aa
or bb
. In your case you would want probably any letter and any number of repetitions so ([a-zA-Z])\1{n,}
for N repetitions. Then substitute it with one occurence using \1
again. So putting it all together:
import re
n=2
expression = r"([a-zA-Z])\1{"+str(n)+",}"
print(re.sub(expression,r"\1","hellooooo friiiiiend"))
# Outputs Hello friend
Note this actually matches N+1 repetitions only, like your test cases. One item then N copies of it. If you want to match exactly N also subtract 1.
Remember to use r
in front of regular expressions so you don't need to double escape backslashes.
Learn more about backreferences: https://www.regular-expressions.info/backref.html Learn more about repetition: https://www.regular-expressions.info/repeat.html
You need a regex that search for multiple occurence of the same char, that is done with (.)\1
(the \1
matches the group 1 (in the parenthesis))
To match
(.)\1
(.)\1\1
or (.)\1{2}
(.)\1\1\1
or (.)\1{3}
So you can build it with an f-string and the value you want (that's a bit ugly because you have literal brackets that needs to be escaped using double brackets, and inside that the bracket to allow the value itself)
def remove_letters(value: str, count: int):
return re.sub(rf"(.)\1{{{count}}}", "", value)
print(remove_letters("Sayy Hellooooooo", 1)) # Sa Heo
print(remove_letters("Sayy Hellooooooo", 2)) # Sayy Hello
print(remove_letters("Sayy Hellooooooo", 3)) # Sayy Hellooo
You may understand the pattern creation easier with that
r"(.)\1{" + str(count) + "}"
This seems to work:
N=2
: the regex pattern is compiled to: ((\w)\2{2,})
N=3
: the regex pattern is compiled to: ((\w)\2{3,})
import re
N = 2
p = re.compile(r"((\w)\2{" + str(N) + r",})")
text = "Sayy Hellooooooo"
matches = p.findall(text)
for match in matches:
text = re.sub(match[0], match[1]*N, text)
print(text)
Sayy Helloo
Also tested with N=3
, N=4
and other text inputs.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.