简体   繁体   中英

REGEX - match list item followed by another list item 'n' times

Suppose we have a list

search_list = [one, two, three, four, five, six]

and we want to match any item in this list which follows another item n times in the following string

example string = This string has one two three and also five six in it

How would we build a regex which can find all items which are adjacent to one another?

In this case, searching with re.findall, the output should be

[('one', 'two', 'three'), ('five', 'six')]

Here's what I've tried so far

Convert list into searchable string:

chain_regex = [re.escape(i) for i in search_list]
chain_regex = '|'.join(chain_re)
re.findall(f'({chain_regex})\s*({chain_regex})', example_string)

This works fine and produces the following output:

[('one', 'two'), ('five', 'six')]

Suppose I want to do this n times. How would you restructure this query so that it can be repeated without just chaining it indefinitely as below:

re.findall(f'({chain_regex})\s*({chain_regex})\s*({chain_regex})*\s*({chain_regex})*', example_string) etc....

EDIT

re.findall(f'({chain_re})(\s*({chain_re}))+', example_string)

produces the following output which isn't quite right.

[('one', ' three', 'three'), ('five', ' six', 'six')]

Chaining more and more items together does work however I can't always be sure of how many times I'd need to chain it together - this is where I'm stuck

You can do this with simple regex, but you have to filter the results:

import re

test1 = "This string has one two three and also five six in it"
reg = re.compile(r"(((one|two|three|four|five|six).?)*)")
match = re.findall(reg, test)
filtered = [m[0] for m in match if len(m[0].split(" ")) > 1]
filtered = [list(filter(None, f.split(' '))) for f in filtered]
filtered #[['one', 'two', 'three'], ['five', 'six']]

Example: (updated) https://regex101.com/r/YhlhRQ/4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM