简体   繁体   中英

Python : Finding all occurance of substring in string without using regex

I have a string :

b = 'Can you can a can as a canner can can a can?'

I need to find all the possible starting and ending positions of the substring "can" in the string b, irrespective of the case. I can do it with using regular expression, but I need a minimal code for the same operation without using regular expression (or without importing re). Here is my code using regex :

import re
b_find = [(i.start() , i.end()) for i in re.finditer(r"can",b.lower())]

I want a solution without using regex, possibly using list comprehension. is there any way?

Yes there is but it is neither super-elegant nor very efficient.. But, here it goes:

b_find = [(i, i+3) for i in range(len(b)-2) if b[i:i+3].lower() == 'can']

and it produces the same result as your regex -based code. That is:

[(0, 3), (8, 11), (14, 17), (23, 26), (30, 33), (34, 37), (40, 43)]

Written as a function, this should serve your purpose:

>>> def split_indices(s, sep):
...     current = 0
...     sep_len = len(sep)
...     sections = s.lower().split(sep)
...     for section in sections[:-1]:  # skip trailing entry
...         current += len(section)
...         yield (current, current+sep_len)
...         current += sep_len

The function is a generator, so if you want to get the result as a list, you'd either have to re-write the function to return a list instead or unpack the result into a list:

>>> b = 'Can you can a can as a canner can can a can?'
>>> [*split_indices(b, 'can')]
[(0, 3), (8, 11), (14, 17), (23, 26), (30, 33), (34, 37), (40, 43)]

An even simpler variation is:

block = 'Can you can a can as a canner can can a can?'.lower()
index = -1
indexes = []
try:
  while True:
    index = block.index('can', index + 1)
    indexes.append(index)
except ValueError:
  pass

It's a supersimple linear finite automaton. It would became a bit more complex if you had a word like 'cacan', but for 'can' it is really easy:

def nextCan( str, state ):
    for i in range(len(str)):
        ch = str[i]
        if 0 == state:
            if ch == 'c':
                state = 1
            else:
                state = 0
        elif 1 == state:
            if ch == 'a':
                state = 2
            else:
                state = 0
        elif 2 == state:
            if ch = 'n':
                yield (i-2,i+1)
            state = 0

b_find = [ x for x in nextCan( b, 0 ) ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM