Match everything except a specific string

Question

I have looked at lots of posts with similar title but I have found nothing that works with python or even this site: https://regex101.com

How can I match everything but a specific text?

My text:

1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

1234_This is a text Word BCD    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            -

I want to match Word \\w+ and then the rest until the next 1234. So the result should be (return groups marked in () ):

(1234_This is a text (Word AB))(

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

)(1234_This is a text (Word BCD)(    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            - )

The first part is easy as: matches = re.findall(r'1234_This is a text (Word \\w+)', var) But the next part I am unable to achieve. I have tried negative lookahead: ^(?!1234) but then it matches nothing any more...

Answer 1

Code

See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234)[\s\S])*)

Using the s modifier you can use the following.
See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234).)*)

Explanation

(1234[\\w ]+(Word \\w+)) Capture the following into capture group 1
- 1234 Match this literally
- [\\w ]+ Match one or more word characters or spaces
- (Word \\w+) Capture the following into capture group 2
  - Word Match this literally (note the trailing space)
  - \\w+ Match any word character one or more times
((?:(?!1234)[\\s\\S])*) Capture the following into capture group 2
- (?:(?!1234)[\\s\\S])* Match the following any number of times ( tempered greedy token )
  - (?!1234) Negative lookahead ensuring what follows doesn't match
  - [\\s\\S])* Match any character any number of times

Answer 2

As you stated out:

I want to match Word \\w+ and then the rest until the next 1234.

Do you want something like this ?

import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            -

1234_This is a text Word BCD
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -"""

match=re.finditer(pattern,string,re.M)
for find in match:
    print("this is group_1 {}".format(find.group(1)))
    print("this is group_3 {}".format(find.group(3)))




    print("this is group_4 {}".format(find.group(4)))

output:

this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4 

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4 
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -

Match everything except a specific string

Question

2 answers

solution1
3 ACCPTED 2017-11-30 15:52:25

Code

Explanation

solution2
1 2017-11-30 15:56:05

Match everything except a specific string

Question

2 answers

solution1 3 ACCPTED 2017-11-30 15:52:25

Code

Explanation

solution2 1 2017-11-30 15:56:05

solution1
3 ACCPTED 2017-11-30 15:52:25

solution2
1 2017-11-30 15:56:05