REGEX - match list item followed by another list item 'n' times

Question

Suppose we have a list

search_list = [one, two, three, four, five, six]

and we want to match any item in this list which follows another item n times in the following string

example string = This string has one two three and also five six in it

How would we build a regex which can find all items which are adjacent to one another?

In this case, searching with re.findall, the output should be

[('one', 'two', 'three'), ('five', 'six')]

Here's what I've tried so far

Convert list into searchable string:

chain_regex = [re.escape(i) for i in search_list]
chain_regex = '|'.join(chain_re)
re.findall(f'({chain_regex})\s*({chain_regex})', example_string)

This works fine and produces the following output:

[('one', 'two'), ('five', 'six')]

Suppose I want to do this n times. How would you restructure this query so that it can be repeated without just chaining it indefinitely as below:

re.findall(f'({chain_regex})\s*({chain_regex})\s*({chain_regex})*\s*({chain_regex})*', example_string) etc....

EDIT

re.findall(f'({chain_re})(\s*({chain_re}))+', example_string)

produces the following output which isn't quite right.

[('one', ' three', 'three'), ('five', ' six', 'six')]

Chaining more and more items together does work however I can't always be sure of how many times I'd need to chain it together - this is where I'm stuck

Answer 1

You can do this with simple regex, but you have to filter the results:

import re

test1 = "This string has one two three and also five six in it"
reg = re.compile(r"(((one|two|three|four|five|six).?)*)")
match = re.findall(reg, test)
filtered = [m[0] for m in match if len(m[0].split(" ")) > 1]
filtered = [list(filter(None, f.split(' '))) for f in filtered]
filtered #[['one', 'two', 'three'], ['five', 'six']]

Example: (updated) https://regex101.com/r/YhlhRQ/4

REGEX - match list item followed by another list item 'n' times

Question

1 answers

solution1
1 2019-10-29 13:14:15

REGEX - match list item followed by another list item 'n' times

Question

1 answers

solution1 1 2019-10-29 13:14:15

solution1
1 2019-10-29 13:14:15