简体   繁体   中英

Match everything except a specific string

I have looked at lots of posts with similar title but I have found nothing that works with python or even this site: https://regex101.com

How can I match everything but a specific text?

My text:

1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

1234_This is a text Word BCD    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            - 

I want to match Word \\w+ and then the rest until the next 1234. So the result should be (return groups marked in () ):

(1234_This is a text (Word AB))(

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

)(1234_This is a text (Word BCD)(    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            - )

The first part is easy as: matches = re.findall(r'1234_This is a text (Word \\w+)', var) But the next part I am unable to achieve. I have tried negative lookahead: ^(?!1234) but then it matches nothing any more...

Code

See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234)[\s\S])*)

Using the s modifier you can use the following.
See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234).)*)

Explanation

  • (1234[\\w ]+(Word \\w+)) Capture the following into capture group 1
    • 1234 Match this literally
    • [\\w ]+ Match one or more word characters or spaces
    • (Word \\w+) Capture the following into capture group 2
      • Word Match this literally (note the trailing space)
      • \\w+ Match any word character one or more times
  • ((?:(?!1234)[\\s\\S])*) Capture the following into capture group 2
    • (?:(?!1234)[\\s\\S])* Match the following any number of times ( tempered greedy token )
      • (?!1234) Negative lookahead ensuring what follows doesn't match
      • [\\s\\S])* Match any character any number of times

As you stated out:

I want to match Word \\w+ and then the rest until the next 1234.

Do you want something like this ?

import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            -

1234_This is a text Word BCD
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -"""

match=re.finditer(pattern,string,re.M)
for find in match:
    print("this is group_1 {}".format(find.group(1)))
    print("this is group_3 {}".format(find.group(3)))




    print("this is group_4 {}".format(find.group(4)))

output:

this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4 

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4 
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM