简体   繁体   中英

find multiple things in a string using regex in python

My input string contains various entities like this: conn_type://host:port/schema#login#password

I want to find out all of them using regex in python.

As of now, I am able to find them one by one, like

conn_type=re.search(r'[a-zA-Z]+',test_string)
  if (conn_type):
    print "conn_type:", conn_type.group()
    next_substr_len = conn_type.end()
    host=re.search(r'[^:/]+',test_string[next_substr_len:])

and so on.

Is there a way to do it without if and else ? I expect there to be some way, but not able to find it. Please note that every entity regex is different.

Please help, I don't want to write a boring code.

Why don't you use re.findall?

Here is an example:

import re;

s = 'conn_type://host:port/schema#login#password asldasldasldasdasdwawwda conn_type://host:port/schema#login#email';

def get_all_matches(s):
    matches = re.findall('[a-zA-Z]+_[a-zA-Z]+:\/+[a-zA-Z]+:+[a-zA-Z]+\/+[a-zA-Z]+#+[a-zA-Z]+#[a-zA-Z]+',s);
    return matches;

print get_all_matches(s);

this will return a list full of matches to your current regex as seen in this example which in this case would be:

['conn_type://host:port/schema#login#password', 'conn_type://host:port/schema#login#email']

If you need help making regex patterns in Python I would recommend using the following website:

A pretty neat online regex tester

Also check the re module's documentation for more on re.findall

Documentation for re.findall

Hope this helps!

If you like it DIY, consider creating a tokenizer . This is very elegant "python way" solution.

Or use a standard lib: https://docs.python.org/3/library/urllib.parse.html but note, that your sample URL is not fully valid: there is no schema 'conn_type' and you have two anchors in the query string, so urlparse wouldn't work as expected. But for real-life URLs I highly recommend this approach.

>>>import re
>>>uri = "conn_type://host:port/schema#login#password"
>>>res = re.findall(r'(\w+)://(.*?):([A-z0-9]+)/(\w+)#(\w+)#(\w+)', uri)
>>>res
[('conn_type', 'host', 'port', 'schema', 'login', 'password')]

No need for ifs. Use findall or finditer to search through your collection of connection types. Filter the list of tuples, as need be.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM