简体   繁体   English

在python中使用正则表达式在字符串中查找多个事物

[英]find multiple things in a string using regex in python

My input string contains various entities like this: conn_type://host:port/schema#login#password 我的输入字符串包含各种不同的实体,例如: conn_type:// host:port / schema#login#password

I want to find out all of them using regex in python. 我想找出所有使用python中的正则表达式的人。

As of now, I am able to find them one by one, like 截至目前,我能够一一找到它们,例如

conn_type=re.search(r'[a-zA-Z]+',test_string)
  if (conn_type):
    print "conn_type:", conn_type.group()
    next_substr_len = conn_type.end()
    host=re.search(r'[^:/]+',test_string[next_substr_len:])

and so on. 等等。

Is there a way to do it without if and else ? 没有if if else的方法 I expect there to be some way, but not able to find it. 我希望有某种方法,但无法找到它。 Please note that every entity regex is different. 请注意,每个实体正则表达式都是不同的。

Please help, I don't want to write a boring code. 请帮忙,我不想写一个无聊的代码。

Why don't you use re.findall? 您为什么不使用re.findall?

Here is an example: 这是一个例子:

import re;

s = 'conn_type://host:port/schema#login#password asldasldasldasdasdwawwda conn_type://host:port/schema#login#email';

def get_all_matches(s):
    matches = re.findall('[a-zA-Z]+_[a-zA-Z]+:\/+[a-zA-Z]+:+[a-zA-Z]+\/+[a-zA-Z]+#+[a-zA-Z]+#[a-zA-Z]+',s);
    return matches;

print get_all_matches(s);

this will return a list full of matches to your current regex as seen in this example which in this case would be: 这将返回一个与当前正则表达式完全匹配的列表,如本例所示,在本例中为:

['conn_type://host:port/schema#login#password', 'conn_type://host:port/schema#login#email']

If you need help making regex patterns in Python I would recommend using the following website: 如果您需要使用Python创建正则表达式模式的帮助,建议您使用以下网站:

A pretty neat online regex tester 一个非常整洁的在线正则表达式测试器

Also check the re module's documentation for more on re.findall 另请参阅re模块的文档以获取有关re.findall的更多信息

Documentation for re.findall re.findall的文档

Hope this helps! 希望这可以帮助!

If you like it DIY, consider creating a tokenizer . 如果您喜欢DIY,请考虑创建一个tokenizer This is very elegant "python way" solution. 这是非常优雅的“ python方式”解决方案。

Or use a standard lib: https://docs.python.org/3/library/urllib.parse.html but note, that your sample URL is not fully valid: there is no schema 'conn_type' and you have two anchors in the query string, so urlparse wouldn't work as expected. 或使用标准库: https ://docs.python.org/3/library/urllib.parse.html,但请注意,您的示例URL并非完全有效:没有模式'conn_type'并且您有两个锚点查询字符串,因此urlparse无法正常工作。 But for real-life URLs I highly recommend this approach. 但是对于现实生活中的URL,我强烈建议使用这种方法。

>>>import re
>>>uri = "conn_type://host:port/schema#login#password"
>>>res = re.findall(r'(\w+)://(.*?):([A-z0-9]+)/(\w+)#(\w+)#(\w+)', uri)
>>>res
[('conn_type', 'host', 'port', 'schema', 'login', 'password')]

No need for ifs. 无需ifs。 Use findall or finditer to search through your collection of connection types. 使用findall或finditer搜索您的连接类型集合。 Filter the list of tuples, as need be. 根据需要过滤元组列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM