I have a string where I am trying to match word patterns which appear either directly after, or one word after a tag. For example:
after_tag = r'here is sentence as an example where a [TAG] ~~M001~~ a word'
one_after_tag = r'here is sentence as an example where a [TAG] can ~~M001~~ a word'
I would also like to extend this to work with connecting words, there have also been tagged. This should also work within a window a one or two words after the [CONNECT] tag, such as:
after_connect = r'here is a sentence where a [TAG] could [CONNECT] ~~M002~~'
one_after_connect = r'here is a sentence where a [TAG] could [CONNECT] a ~~M002~~'
I have tried the following regex with the re package in Python.
regex_current = re.compile(((?:(?<=(\{TAG})))(.*?)\~\~[A-Z0-9]{4,5}\~\~))
Please can anyone help? I've found the following website helpful in testing.
Here is one solution that matches both ~~...~~
words:
(?<=\[TAG\])( \w*)? ~~\w*~~
[TAG]
before match. ~
~
word. If there is a word in between it will be matched as well, so you can either group the second word or split the result and use the last index.
To encompass also the CONNECT
examples just or
the same thing:
(?<=\[TAG\])( \w*)? ~~\w*~~|(?<=\[CONNECT\])( \w*)? ~~\w*~~
It's the exact the regular expression, but the look-behind needs constant width. If you do not mind matching the TAG you can shorten this to:
\[(TAG|CONNECT)\]( \w*)? ~~\w*~~
This is really only needed if you expect there to be more ~~LettersAndDigits~~
words. If not, you can search for that exactly
~~\w*~~
without anything fancy.
Final addition
Two ensure both TAG
and CONNECT
appear (CONNECT optionally):
\[TAG\]( \w*)?( \[CONNECT\]( \w*)?)? ~~[\w]*~~
Here is the fiddle . To only get the word in this case I would definitly using grouping ()
, as the match length is variable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.