简体   繁体   中英

[FORKING]Python Regex - Re.Sub and Re.Findall Interesting Challenges

Not sure if this is something that should be a bounty. II just want to understand regex better.

I checked the responses in the Regex to match pattern.one skip newlines and characters until pattern.two and Regex to match if given text is not found and match as little as possible threads and read about Tempered Greedy Token Solutions and Explicit Greedy Alternation Solutions on RexEgg, but admittedly the explanations baffled me.

I spent the last day fiddling mainly with re.sub (and with findall) because re.sub's behaviour is odd to me.

.

Problem 1:

Given Strings below with characters followed by / how would I produce a SINGLE regex (using only either re.sub or re.findall) that uses alternating capture groups which must use [\\S]+/ to get the desired output

>>> string_1 = 'variety.com/2017/biz/news/tax-march-donald-trump-protest-1202031487/'
>>> string_2 = 'variety.com/2017/biz/the/life/of/madam/green/news/tax-march-donald-trump-protest-1202031487/'
>>> string_3 = 'variety.com/2017/biz/the/life/of/news/tax-march-donald-trump-protest-1202031487/the/days/of/our/lives'

Desired Output Given the Conditions(!!)

tax-march-donald-trump-protest-

CONDITIONS: Must use alternating capture groups which must capture ([\\S]+) or ([\\S]+?)/ to capture the other groups but ignore them if they don't contain -

I'M WELL AWARE that it would be better to use re.findall('([\\-]*(?:[^/]+?\\-)+)[\\d]+', string) or something similar but I want to know if I can use [\\S]+ or ([\\S]+) or ([\\S]+?)/ and tell regex that if those are captured, ignore the result if it contains / or doesn't contain - While also having used an alternating capture group

I KNOW I don't need to use [\\S]+ or ([\\S]+) but I want to see if there is an extra directive I can use to make the regex reject some characters those two would normally capture.

Posted per request:

(?:(?!/)[\S])*-(?:(?!/)[\S])*

https://regex101.com/r/azrwjO/1

Explained

 (?:                           # Optional group
      (?! / )                       # Not a forward slash ahead
      [\S]                          # Not whitespace class
 )*                            # End group, do 0 to many times
 -                             # A dash must exist
 (?:                           # Optional group,  same as above
      (?! / )
      [\S] 
 )*

You could use

/([-a-z]+)-\d+

and take the first capturing group, see a demo on regex101.com .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM