简体   繁体   中英

Regex - capture words after match

I have a string where I am trying to match word patterns which appear either directly after, or one word after a tag. For example:

after_tag = r'here is sentence as an example where a [TAG] ~~M001~~ a word'
one_after_tag = r'here is sentence as an example where a [TAG] can ~~M001~~ a word'

I would also like to extend this to work with connecting words, there have also been tagged. This should also work within a window a one or two words after the [CONNECT] tag, such as:

after_connect = r'here is a sentence where a [TAG] could [CONNECT] ~~M002~~'
one_after_connect = r'here is a sentence where a [TAG] could [CONNECT] a ~~M002~~'

I have tried the following regex with the re package in Python.

regex_current = re.compile(((?:(?<=(\{TAG})))(.*?)\~\~[A-Z0-9]{4,5}\~\~))

Please can anyone help? I've found the following website helpful in testing.

Here is one solution that matches both ~~...~~ words:

(?<=\[TAG\])( \w*)? ~~\w*~~
  1. Look for [TAG] before match.
  2. Match one or less words with no ~
  3. Match th ~ word.

If there is a word in between it will be matched as well, so you can either group the second word or split the result and use the last index.

Here is the example .

To encompass also the CONNECT examples just or the same thing:

(?<=\[TAG\])( \w*)? ~~\w*~~|(?<=\[CONNECT\])( \w*)? ~~\w*~~

It's the exact the regular expression, but the look-behind needs constant width. If you do not mind matching the TAG you can shorten this to:

\[(TAG|CONNECT)\]( \w*)? ~~\w*~~

This is really only needed if you expect there to be more ~~LettersAndDigits~~ words. If not, you can search for that exactly

~~\w*~~

without anything fancy.

Final addition

Two ensure both TAG and CONNECT appear (CONNECT optionally):

\[TAG\]( \w*)?( \[CONNECT\]( \w*)?)? ~~[\w]*~~

Here is the fiddle . To only get the word in this case I would definitly using grouping () , as the match length is variable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM