简体   繁体   中英

How to get all the words "bar" after the word "foo" with regex

I am trying to retrieve all occurrences of the word bar after the word foo .

Below the content:

fuu

bar

faa

foo bar fuu

bar fuu bar

bar

bar fuu

I want to retrieve all bars that are bold and disregard the first bar that is in italic format.

I tried to use the follow regular expression:

(?<=foo)bar

But that only catches the first occurrence.

UPDATE

Thanks for the support guys. Below the data more closer to reality:

Some data

name: Person 1

Some data

my_delimiter:

 name: Person 2

 Some data

name: Person 3

Some data

 name: Person 4

 Some data

Some data

I want to get the name of the persons after my_delimiter:

I am testing here https://regex101.com/r/HrCLva/2

Two things, depending on what exactly you're after:

  1. If you're after all the occurrences on a single line, then you need to use re.findall :

     exp = re.compile("foo(?:.*?((bar)+)*)") # See https://regex101.com/r/zzBFFb/1 match = exp.findall(mystring)
  2. If you're after all occurrences on multiple lines as above then you need to add some flags to tell it not to treat newlines differently:

     exp = re.compile("foo(?:.*?((bar)+)*)", re.DOTALL | re.MULTILINE) match = exp.findall(mystring)

After you updated your answer, you don't need a regex lookbehind, you can use a regex like this to find the first name after your delimiter:

my_delimiter:\s+name:\s*(.*)

Working demo

On the other hand, if you want all names after your delimiter, then you can use a regex trick like this and then grab the content from capturing group:

[\s\S]*my_delimiter|name:\s*(.*)$

Working demo 2

The capturing group stores the data highlighted in green.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM