简体   繁体   中英

Python - Regex - Matching all beginning sequence excluding another pattern

Goal : return grouping that matches all the beginning sequence but excluding a size sequence.

## List of strings and desired result
strs = [
   '151002 - Some name',       ## ('151002 - ', 'Some name')
   'Another name here',        ## ('', 'Another name here')
   '13-10-07_300x250_NoName',  ## ('13-10-07_', '300x250_NoName')
   '728x90 - nice name'        ## ('', '728x90 - nice name')
]

Attempted Pattern

## This pattern is close
## 
pat = '''
^                       ## From start of string
(                       ## Group 1
   [0-9\- ._/]*         ## Any number or divider
   (?!                  ## Negative Lookahead
      (?:\b|[\- ._/\|]) ## Beginning of word or divider
      \d{1,3}           ## Size start
      (?:x|X)           ## big or small 'x'
      \d{1,3}           ## Size end
   )           
)
(                       ## Group 2
   .*                   ## Everthing else
)
'''

## Matching
[re.compile(pat, re.VERBOSE).match(s).groups() for s in strs]

Attempted Pattern Result

[
   ('151002 - ', 'Some name'),      ## Good
   ('', 'Another name here'),       ## Good
   ('13-10-07_300', 'x250_NoName'), ## Error
   ('728', 'x90 - nice name')       ## Error
]

I think this might give you what you want:

[re.match(r"^([^x]+[\-_]\s?)?(.*$)", s).groups() for s in strs]

Explanation of regex: Start at the beginning of the string, look for one or more characters that aren't an x that are followed by a hyphen or underscore and possibly followed by a space. That's group one and there can be zero or one of those. Group two is everything else.

EDIT:

Assuming that your strings can have something other than the letter x amongst the numbers, you can modify the code to this:

[re.match(r"^([^a-zA-Z]+[\-_]\s?)?(.*$)", s).groups() for s in strs]

i think you misunderstand the use of lookaheads. This pattern should work

((?:(?!\d{1,3}x\d{1,3})[0-9\- ._/])*)(.*)

正则表达式可视化

Debuggex Demo

if you want an explanation, because I know it is a disgusting regex, just ask :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM