I have the following string:
the quick brown fox abc(1)(x)
with the following regex:
(?i)(\s{1})(abc\(1\)\([x|y]\))
and the output is
abc(1)(x)
which is expected, however, I can't seem to:
I would like the following output:
the quick brown fox abc(1)(x)
from the primary lookup "abc(1)(x)" I would like up to 5 words on either side of the lookup. my assumption is that spaces would demarcate a word.
Edit 1:
The 5 words on either side would be unknown for future examples. the string may be:
cat with a black hat is abc(1)(x) the quick brown fox jumps over the lazy dog.
In this case, the desired output would be:
with a black hat is abc(1)(x) the quick brown fox jumps
Edit 2:
edited the expected output in the first example and added "up to" 5 words
(?:[0-9A-Za-z_]+[^0-9A-Za-z_]+){0,5}abc\(1\)\([xy]\)(?:[^0-9A-Za-z_]+[0-9A-Za-z_]+){0,5}
Note that I've changed \\w+
to [0-9A-Za-z_]+
and \\W+
to [^0-9A-Za-z_]+
because depending on your locale / Unicode settings \\W
and \\w
might not act the way you expect in Python.
Also note I don't specifically look for spaces, just "non-word characters" this probably handles edge cases a little better for quote characters etc. But regardless this should get you most of the way there.
BTW: You calling this "lookaround" - really it has nothing to do with "regex lookaround" the regex feature.
If I understand your requirements correctly, you want to do something like this:
(?:\w+[ ]){0,5}(abc\(1\)\([xy]\))(?:[ ]\w+){0,5}
Demo .
BreakDown:
(?: # Start of a non-capturing group.
\w+ # Any word character repeated one or more times (basically, a word).
[ ] # Matches a space character literally.
) # End of the non-capturing group.
{0,5} # Match the previous group between 0 and 5 times.
( # Start of the first capturing group.
abc\(1\) # Matches "abc(1)" literally.
\([xy]\) # Matches "(x)" or "(y)". You don't need "|" inside a character class.
) # End of the capturing group.
(?:[ ]\w+){0,5} # Same as the non-capturing group above but the space is before the word.
Notes:
(?i)
as you're doing already or use the re.IGNORECASE
flag . [ ]
with either \\W+
(which means non-word characters) or with a character class which includes all the punctuation characters that you want to support (eg, [.,;?! ]
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.