Goal : return grouping that matches all the beginning sequence but excluding a size sequence.
## List of strings and desired result
strs = [
'151002 - Some name', ## ('151002 - ', 'Some name')
'Another name here', ## ('', 'Another name here')
'13-10-07_300x250_NoName', ## ('13-10-07_', '300x250_NoName')
'728x90 - nice name' ## ('', '728x90 - nice name')
]
Attempted Pattern
## This pattern is close
##
pat = '''
^ ## From start of string
( ## Group 1
[0-9\- ._/]* ## Any number or divider
(?! ## Negative Lookahead
(?:\b|[\- ._/\|]) ## Beginning of word or divider
\d{1,3} ## Size start
(?:x|X) ## big or small 'x'
\d{1,3} ## Size end
)
)
( ## Group 2
.* ## Everthing else
)
'''
## Matching
[re.compile(pat, re.VERBOSE).match(s).groups() for s in strs]
Attempted Pattern Result
[
('151002 - ', 'Some name'), ## Good
('', 'Another name here'), ## Good
('13-10-07_300', 'x250_NoName'), ## Error
('728', 'x90 - nice name') ## Error
]
I think this might give you what you want:
[re.match(r"^([^x]+[\-_]\s?)?(.*$)", s).groups() for s in strs]
Explanation of regex: Start at the beginning of the string, look for one or more characters that aren't an x that are followed by a hyphen or underscore and possibly followed by a space. That's group one and there can be zero or one of those. Group two is everything else.
EDIT:
Assuming that your strings can have something other than the letter x amongst the numbers, you can modify the code to this:
[re.match(r"^([^a-zA-Z]+[\-_]\s?)?(.*$)", s).groups() for s in strs]
i think you misunderstand the use of lookaheads. This pattern should work
((?:(?!\d{1,3}x\d{1,3})[0-9\- ._/])*)(.*)
if you want an explanation, because I know it is a disgusting regex, just ask :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.