I have looked at lots of posts with similar title but I have found nothing that works with python or even this site: https://regex101.com
How can I match everything but a specific text?
My text:
1234_This is a text Word AB
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4 -
1234_This is a text Word BCD
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 -
I want to match Word \\w+
and then the rest until the next 1234. So the result should be (return groups marked in ()
):
(1234_This is a text (Word AB))(
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4 -
)(1234_This is a text (Word BCD)(
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 - )
The first part is easy as: matches = re.findall(r'1234_This is a text (Word \\w+)', var)
But the next part I am unable to achieve. I have tried negative lookahead: ^(?!1234)
but then it matches nothing any more...
(1234[\w ]+(Word \w+))((?:(?!1234)[\s\S])*)
Using the s
modifier you can use the following.
See regex in use here
(1234[\w ]+(Word \w+))((?:(?!1234).)*)
(1234[\\w ]+(Word \\w+))
Capture the following into capture group 1
1234
Match this literally [\\w ]+
Match one or more word characters or spaces (Word \\w+)
Capture the following into capture group 2
Word
Match this literally (note the trailing space) \\w+
Match any word character one or more times ((?:(?!1234)[\\s\\S])*)
Capture the following into capture group 2
(?:(?!1234)[\\s\\S])*
Match the following any number of times ( tempered greedy token )
(?!1234)
Negative lookahead ensuring what follows doesn't match [\\s\\S])*
Match any character any number of times As you stated out:
I want to match Word \\w+ and then the rest until the next 1234.
Do you want something like this ?
import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4 -
1234_This is a text Word BCD
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 -"""
match=re.finditer(pattern,string,re.M)
for find in match:
print("this is group_1 {}".format(find.group(1)))
print("this is group_3 {}".format(find.group(3)))
print("this is group_4 {}".format(find.group(4)))
output:
this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 -
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.