简体   繁体   中英

Combine in an efficient way regex python

Setup

I create dynamically a list of regex, namely regex_list . Each regex in the list does for sure at least one match on the text to which is applied. It may happens that some regex in the list are equals.

regex_list = []
for f in foo: # foo is a list of strings e.g. foo = ['foo1', 'foo2', 'foo1', ...]
    # f is a valid expression to be used inside the regex
    regex_list.append(f'[^.]*?{f}[^.]*\.')

regex = re.compile('|'.join(regex_list), flags=re.DOTALL)
result = re.findall(regex, text)

Problem

Since

  1. some regex in regex_list may be equals
  2. regex in regex_list are combined together with the OR operator

for the regex for which exists another copy in the list, only the first match in the text is captured.

Question

A workaround could be to apply each regex individually with a for-loop, but it is very slow.

Is there a good way to combine regex and make them match everything possible?

Casually discovered that applying each regex individually in a for-loop is very slow using the re module , while it's surprisingly faster using the regex module .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM