简体   繁体   中英

Find number of sub-string in a string between two character(a*b) using python re module

Given a string S as input. The program must find the number of patterns matching a*b. where * represent 1 or more alphabets.

import re
s = input()
matches = re.findall(r'MAGIC',s)
print(len(matches))


'''
i/p - aghba34bayyb
o/p - 2
(i.e aghb,ayyb)
It should not take a34b in count.

i/p - aabb
o/p - 3
(i.e aab abb aabb)

i/p : adsbab 
o/p : 2 
(i.e adsb ab)'''

You can use

a[a-zA-Z]+?b

在此处输入图片说明


import re
s = input()
matches = re.findall(r'a[a-zA-Z]+?b',s)
print(len(matches))

Python Demo

Using re.finditer to match all substrings:

inputs = ['aghba34bayyb',
'aabb',
'adsbab']

import re

def all_substrings(s):
    length, seen = len(s), set()
    for i in range(length):
        for j in range(i + 1, length + 1):
            for g in re.finditer(r'(a[^\d]+b)', s[i:j]):
                if (i+g.start(), i+g.end()) in seen:
                    continue
                seen.add((i+g.start(), i+g.end()))
                yield g.groups()[0]

for i in inputs:
    print('Input="{}" Matches:'.format(i))
    for s in all_substrings(i):
        print(' "{}"'.format(s))

Prints:

Input="aghba34bayyb" Matches:
 "aghb"
 "ayyb"
Input="aabb" Matches:
 "aab"
 "aabb"
 "abb"
Input="adsbab" Matches:
 "adsb"
 "adsbab"

You can find the positions of a and b in the word, find all possible substrings and then filter the substrings that only contains one or more chars in between

from itertools import product

words = ['aghba34bayyb', 'aabb', 'adsbab']

for word in words:
    a_pos = [i for i,c in enumerate(word) if c=='a']
    b_pos = [i for i,c in enumerate(word) if c=='b']
    all_substrings = [word[s:e+1] for s,e in product(a_pos, b_pos) if e>s]
    substrings = [s for s in all_substrings if re.match(r'a[a-zA-Z]+b$', s)]
    print (word, substrings)

Output

aghba34bayyb ['aghb', 'ayyb']
aabb ['aab', 'aabb', 'abb']
adsbab ['adsb', 'adsbab']
re.findall(r'a[A-Za-z]+?b',s)

Where

  • [A-Za-z] matches an alphabetic character,
  • + is one or more characters
  • ? tells it to be nongreedy

You could match a followed by 1 char az and then use a character class matching 0+ times a or cz and then match the first b

a[a-z][ac-z]*b

Regex demo

If you want to match all following b's to match aabb instead of aab you could use

a[a-z][ac-z]*b+

Regex demo | Python demo

import re
s = input()
matches = re.findall(r'a[a-z][ac-z]*b+',s)
print(len(matches))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM