简体   繁体   中英

How to use python regular expressions to return a list of strings that comes before and after a certain pattern?

For example

s = "Before\=String TARGETA After\=String limbo nonsense Before\=String TARGETB After\=String ..... Before\=String TARGETC After\=String"

Result List should be:

['TARGETA','TARGETB','TARGETC']

I've tried

regex = '.*Before\=String(.*?)After\=String.*'
matches = re.search(regex, val).groups()
>> (' TARGETC ',)

The problem is that it only returns the last item.

You need to use re.findall() instead of re.search() , and remove the .* elements from the start and end:

regex = r'Before\\=String(.*?)After\\=String'
matches = re.findall(regex, val)

Demo:

>>> import re
>>> s = "Before\=String TARGETA After\=String limbo nonsense Before\=String TARGETB After\=String ..... Before\=String TARGETC After\=String"
>>> regex = r'Before\\=String(.*?)After\\=String'
>>> re.findall(regex, s)
[' TARGETA ', ' TARGETB ', ' TARGETC ']

Note that this still includes the whitespace; if you want to not include that too, add \\s* before and after the (...) capturing group.

Use re.findall() to return a list of all matches, and make sure to double escape the backslashes if your actual string does contain them. You can remove the leading/trailing .* because it is not neccessary for finding these substrings and use \\s* before and after the capturing group to eat up the excess whitespace.

>>> import re
>>> s = 'Before\=String TARGETA After\=String limbo nonsense Before\=String TARGETB After\=String ..... Before\=String TARGETC After\=String'
>>> re.findall(r'Before\\=String\s*(.*?)\s*After\\=String', s)
['TARGETA', 'TARGETB', 'TARGETC']

It's not clear whether your backslashes are really in the target string. If they are, and require matching, then you need to put them in pairs in the regex as a simple \\= will match just the equals sign.

re.search won't do what you ask because it only ever finds the first occurrence of the pattern in the target string. You also don't need .* fore and aft of the core of the regex, because (unless you use re.match ) a pattern can match anywhere in the target string and doesn't have to match all of it.

The re.findall function is the one you need. Instead of returning a MatchObject it simply passes back a list of all the substrings within the target string that matched the pattern. Or, if there are any groups in the pattern, it will return the substrings matched by those groups instead of what the whole pattern matched.

The code below allows for optional whitespace around the contents of the before and after markers. Also, if you're hoing to define the regex on a separate line then you may as well compile it there as well. The re.X flag value allows for insignificant whitespace to be added to the regex to make it more readable.

import re

val = "Before\=String TARGETA After\=String limbo nonsense Before\=String TARGETB After\=String ..... Before\=String TARGETC After\=String"

regex   = re.compile(r' Before\\=String \s* (.*?) \s* After\\=String ', flags=re.X)
matches = re.findall(regex, val)


print(matches)

output

['TARGETA', 'TARGETB', 'TARGETC']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM