简体   繁体   中英

Find all occurrences of multiple regex conditions using python regex

Given 2 different regex patterns, i want to find all occurrences of those 2 patters. If only pattern 1 matches then return that, if only pattern 2 matches then return that and if pattern 1 and pattern 2 matches then return both of them. So how do i run multiple(in this case 2 regex) in one statement?

Given input string:

"https://test.com/change-password?secret=12345;email=test@gmail.com;previous_password=hello;new=1"

I want to get the value of email and secret only. So i want the output as ['12345', 'test@gmail.com']

import re
print(re.search(r"(?<=secret=)[^;]+", s).group())
print(re.search(r"(?<=email=)[^;]+", s).group())

I am able to get the expected output by running the regex multiple times. How do i achieve it within a single statement? I dont want to run re.search 2 times. Can i achieve this within one search statement?

>>> re.findall(r"((?:(?<=email=)|(?<=secret=))[^;]+)", s)
['12345', 'test@gmail.com']

But now you'll need a way of identifying which of the resulting values is the secret and which is the email. I'd recommend also extracting this information with the regex (which also eliminates the lookbehind):

>>> dict(kv.split('=') for kv in re.findall(r"((?:secret|email)=[^;]+)", s))
{'secret': '12345', 'email': 'test@gmail.com'}
import re print(re.findall("(?<=secret=)[^;]+|(?<=email=)[^;]+", s)) # output # ['12345', 'test@gmail.com']

You could use a dict comprehension:

import re
url = "https://test.com/change-password?secret=12345;email=test@gmail.com;previous_password=hello;new=1"

rx = re.compile(r'(?P<key>\w+)=(?P<value>[^;]+)')

dict_ = {m['key']: m['value'] for m in rx.finditer(url)}

# ... then afterwards ...
lst_ = [value for key in ("secret", "email") if key in dict_ for value in [dict_[key]]]
print(lst_)
# ['12345', 'test@gmail.com']

So i ended up using the urllib as suggested by @ctwheels

url_exclude = ["email", "secret"]
import urllib.parse as urlparse
from urllib.parse import urlencode, urlunparse
url_parsed_string = urlparse.urlparse(input_string)
parsed_columns = urlparse.parse_qs(url_parsed_string.query)
for exclude_column in url_exclude:
    if exclude_column in parsed_columns:
        parsed_columns[exclude_column] = "xxxxxxxxxx"
qstr = urlencode(parsed_columns)
base_url = urlunparse((url_parsed_string.scheme, url_parsed_string.netloc, 
url_parsed_string.path, url_parsed_string.params, qstr, 
url_parsed_string.fragment))
print(base_url)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM