How to get all the substrings in string using Regex in Python

Question

I have a string such as: "12345"

using the regex, how to get all of its substrings that consist of one up to three consecutive characters to get an output such as:

'1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345'

Answer 1

You can use re.findall with a positive lookahead pattern that matches a character repeated for a number of times that's iterated from 1 to 3:

[match for size in range(1, 4) for match in re.findall('(?=(.{%d}))' % size, s)]

However, it would be more efficient to use a list comprehension with nested for clauses to iterate through all the sizes and starting indices:

[s[start:start + size] for size in range(1, 4) for start in range(len(s) - size + 1)]

Given s = '12345' , both of the above would return:

['1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345']