Suppose we have a list
search_list = [one, two, three, four, five, six]
and we want to match any item in this list which follows another item n times in the following string
example string = This string has one two three and also five six in it
How would we build a regex which can find all items which are adjacent to one another?
In this case, searching with re.findall, the output should be
[('one', 'two', 'three'), ('five', 'six')]
Here's what I've tried so far
Convert list into searchable string:
chain_regex = [re.escape(i) for i in search_list]
chain_regex = '|'.join(chain_re)
re.findall(f'({chain_regex})\s*({chain_regex})', example_string)
This works fine and produces the following output:
[('one', 'two'), ('five', 'six')]
Suppose I want to do this n times. How would you restructure this query so that it can be repeated without just chaining it indefinitely as below:
re.findall(f'({chain_regex})\s*({chain_regex})\s*({chain_regex})*\s*({chain_regex})*', example_string) etc....
EDIT
re.findall(f'({chain_re})(\s*({chain_re}))+', example_string)
produces the following output which isn't quite right.
[('one', ' three', 'three'), ('five', ' six', 'six')]
Chaining more and more items together does work however I can't always be sure of how many times I'd need to chain it together - this is where I'm stuck
You can do this with simple regex, but you have to filter the results:
import re
test1 = "This string has one two three and also five six in it"
reg = re.compile(r"(((one|two|three|four|five|six).?)*)")
match = re.findall(reg, test)
filtered = [m[0] for m in match if len(m[0].split(" ")) > 1]
filtered = [list(filter(None, f.split(' '))) for f in filtered]
filtered #[['one', 'two', 'three'], ['five', 'six']]
Example: (updated) https://regex101.com/r/YhlhRQ/4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.