Regular Expression to match first and last character of a word

Question

I have a string in which I want to make a regular expression in python to find three character repetend words who's first and last character should be same and middle one can any character

Sample string

s = 'timtimdsikmu nmunju ityakbonbonjdjjd kitkat ghdnj samsun ksuwjkhokhojeuhj jimjam jsju'

I want to extract all the highlighted words from above string...

My solution, but not matching with my requirement

import re

s='timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'

re.findall(r'([a-z].[a-z])(\1)',s)

this is giving me this

[('tim', 'tim'), ('mun', 'mun'), ('bon', 'bon'), ('kho', 'kho')]

I want this

[('kit', 'kat'), ('sam', 'sun'), ('jim', 'jam'),('nmu', 'nju')]

Thanks

Answer 1

You can use capturing groups and references:

s='timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'

import re
out = re.findall(r'((.).(.)\2.\3)', s)
[e[0] for e in out]

output:

['timtim', 'munmun', 'bonbon', 'kitkat', 'khokho', 'jimjam']

ensuring the middle letter is different:

[e[0] for e in re.findall(r'((.)(.)(.)\2(?!\3).\4)', s)]

output:

['nmunju', 'kitkat', 'jimjam']

edit: split output:

>>> [(e[0][:3], e[0][3:]) for e in re.findall(r'((.)(.)(.)\2(?!\3).\4)', s)]
[('nmu', 'nju'), ('kit', 'kat'), ('jim', 'jam')]

Answer 2

There is always the pure Python way:

s = 'timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'

result = []
for i in range(len(s) - 5):
    word = s[i:(i+6)]
    if (word[0] == word[3] and word[2] == word[5] and word[1] != word[4]):
        result.append(word)
    
print(result)

['nmunju', 'kitkat', 'jimjam']

Answer 3

You can use this regex in python:

(?P<first>([a-z])(.)([a-z]))(?P<second>\2(?!\3).\4)

Group first is for first word and second is for the second word.

(?!\3) is negative lookahead to make sure second character is not same in 2nd word.

RegEx Demo

import re

rx = re.compile(r"(?P<first>([a-z])(.)([a-z]))(?P<second>\2(?!\3).\4)")
s = 'timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'
for m in rx.finditer(s): print(m.group('first'), m.group('second'))

Output:

nmu nju
kit kat
jim jam

Answer 4

You can do it faster with for loop:

result2 = []
for i in range(len(s)):
    try:
        if s[i] == s[i+3] and s[i+2] == s[i+5]:
            result2.append((s[i:i+3], s[i+3:i+6]))
    except IndexError:pass

print(result2)

Regular Expression to match first and last character of a word

Question

4 answers

solution1
7 ACCPTED 2021-10-08 16:40:34

ensuring the middle letter is different:

edit: split output:

solution2
3 2021-10-08 16:40:21

solution3
2 2021-10-08 18:37:51

solution4
1 2021-10-08 17:27:27

Regular Expression to match first and last character of a word

Question

4 answers

solution1 7 ACCPTED 2021-10-08 16:40:34

ensuring the middle letter is different:

edit: split output:

solution2 3 2021-10-08 16:40:21

solution3 2 2021-10-08 18:37:51

solution4 1 2021-10-08 17:27:27

solution1
7 ACCPTED 2021-10-08 16:40:34

solution2
3 2021-10-08 16:40:21

solution3
2 2021-10-08 18:37:51

solution4
1 2021-10-08 17:27:27