简体   繁体   English

Python正则表达式-转义

[英]Python Regex - Escaping

I have a Python regex that takes a string (database connection URI) and splits it using named groups into username, password etc. 我有一个Python正则表达式,它接受一个字符串(数据库连接URI),并使用命名组将其拆分为用户名,密码等。

uri = 'username:password@host/database'
m = re.compile('^(?P<user>[^:@]+)(\:(?P<password>[^@]*))?@(?P<host>[^\:@/]+)(\:(?P<port>[0-9]+))?/(?P<db>[^\?]+)?$').match(uri)
print m.groupdict()
{'host': 'host', 'password': 'password', 'db': 'database', 'user': 'username', 'port': None}

This works fine. 这很好。 The problem is if the uri has a @ symbol in it, since that's used to split password and host. 问题是uri中是否有@符号,因为它用于拆分密码和主机。 For example, 例如,

uri = 'username:p@ssword@host/database'

will not match, which is expected. 将不匹配,这是预期的。 However, I'd like to be able to escape the special character, eg: 但是,我希望能够转义特殊字符,例如:

uri = 'username:p\@ssword@host/database'

and have it match. 并使其匹配。 My regex experience is pretty limited - I guess what I'd like to do is modify the 我的正则表达式经验非常有限-我想我想做的就是修改

(?P<password>[^@]*)

group so that it will match any character that's not a @, unless it's preceded by a \\ character. 组,以便它可以匹配不是@的任何字符,除非它前面带有\\字符。 Of course, some (most) connection strings will not contain a \\@ at all. 当然,某些(大多数)连接字符串将根本不包含\\ @。

Any help much appreciated. 任何帮助,不胜感激。

You could do: 您可以这样做:

(?P<password>([^\\@]|\\.)*)

This scans through your string and matches either: a non- \\ or non- @ , OR a backslash in which case it matches whatever follows too. 这将扫描您的字符串并匹配:非\\或非@ ,或者匹配一个反斜杠(在这种情况下,它也匹配后面的内容)。 The only way an '@' can be matched by that regex is if it sneaks in through the \\\\. 该正则表达式可以匹配“ @”的唯一方法是它通过\\\\.潜入\\\\. regex, ie it's escaped. 正则表达式,即逃脱了。

As an aside, to write regex in python, use r"insert_regex_here". 另外,要使用python编写正则表达式,请使用r“ insert_regex_here”。

Otherwise for a regex \\\\. 否则为正则表达式\\\\. , you have to write it in python like "\\\\\\\\." ,则必须使用"\\\\\\\\."类的python语言编写"\\\\\\\\." . TO avoid that you can do r"\\\\." 为了避免这种情况,您可以执行r"\\\\." .

I would recommend you to use re.split : 我建议您使用re.split

>>> print re.split(r"(?<!\\)@|/|:", r"username:password@host/database")
['username', 'password', 'host', 'database']
>>> print re.split(r"(?<!\\)@|/|:", r"username:p\@ssword@host/database")
['username', 'p\\@ssword', 'host', 'database']

My take is you want greedy matching, that is password is up the last @ and hostname is between last @ and first / 我的看法是,您想要贪婪的匹配,即密码位于最后一个@,而主机名位于最后一个@与/之间。

A simple way could be like this: 一个简单的方法可能是这样的:

In [68]: re.match('((?P<user>.*):)((?P<pass>.*)@)((?P<host>.*)/)((?P<db>.*))', "username:p@ssword@host/data").groupdict()
Out[68]: {'db': 'data', 'host': 'host', 'pass': 'p@ssword', 'user': 'username'}

You might want to add optionals, that is (stuff)+ if eg username and password can be omitted. 如果可能省略用户名和密码,则可能要添加可选选项,即(stuff)+。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM