简体   繁体   中英

RegEx : Match all lines except for a specific sub-string

Below is the list :

cf-ab1
cf-bc2
cf-ab1-hotfix
cf-bc2-hotfix
cf-ab1-canary
cf-cd1-staging
cf-cd1-staging2
cf-cd1
cf-cd1-sic-staging
cf-cd1-sagdf-staging

I would like to match everything except for cf-cd1-staging, cf-cd1-staging2 and cf-ab1-canary I am running the below regex :

 ^((?!canary|staging).)*$

But these ideally matches all lines that doesnot contain staging and canary..! which should not be my desired o/p.

Could you please help here..!? because my desired matches should be :

cf-ab1
cf-bc2
cf-ab1-hotfix
cf-bc2-hotfix
cf-cd1
cf-cd1-sic-staging
cf-cd1-sagdf-staging

Regards,

Rohith

Try this : -

import re

lines = ["cf-ab1", "cf-bc2", "cf-ab1-hotfix", "cf-bc2-hotfix", "cf-ab1-canary", "cf- 
cd1-staging", "cf-cd1-staging2", "cf-cd1", "cf-cd1-sic-staging", "cf-cd1-sagdf- 
staging"]

line_compile = re.compile('^(?!.*(ab1-canary|cd1-staging|cf-ab1-canary)).*$')

matched = []

for line in lines:
  if  line_compile.match(line):
     matched.append(line)

As always with RegEx, there's many possible solutions. I came up with one on the fly but you could argue that it's overfitted to that dataset and not very generalized.

^cf-\w\w\d(-[hs][oia][tcg].+?)?$

I simply wrote all the "allowed" letters in square brackets until the undesired matches weren't possible anymore. Also, I put the second half in ()? so that the two short entries are also matched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM