简体   繁体   中英

Regex - Python matching between string and first occurence

I'm having a hard time grasping regex no matter how much documentation I read up on. I'm trying to match everything between aa string and the first occurrence of & this is what I have

link =  "group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("group\.do\?sys_id=(.?)&")
sysid = rex.search(link).groups()[0]

I'm using https://regex101.com/#python to help me validate my regex and I can kinda get rex = re.compile("user_group.do?sys_id=(.*)&") to work but the .* is greedy and matches to the last & and im looking to match to the first &

I thought .? matches zero to 1 time

You don't necessarily need regular expressions here. Use urlparse instead:

>>> from urlparse import urlparse, parse_qs 
>>> parse_qs(urlparse(link).query)['sys_id'][0]
'69adb887157e450051e85118b6ff533c'

In case of Python 3 change the import to:

from urllib.parse import urlparse, parse_qs

You can simply regex out to the &amp instead of the final & like so:

import re
link =  "user_group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("user_group\.do\?sys_id=(.*)&&")
sysid = rex.search(link).groups()[0]

print(sysid)
.* 

is greedy but

.*? 

should not be in regex.

.? 

would only look for any character 0-1 times while

.*? 

will look for it up to the earliest matching occurrence. I hope that explains it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM