[英]Python Regex - Escaping
I have a Python regex that takes a string (database connection URI) and splits it using named groups into username, password etc. 我有一个Python正则表达式,它接受一个字符串(数据库连接URI),并使用命名组将其拆分为用户名,密码等。
uri = 'username:password@host/database'
m = re.compile('^(?P<user>[^:@]+)(\:(?P<password>[^@]*))?@(?P<host>[^\:@/]+)(\:(?P<port>[0-9]+))?/(?P<db>[^\?]+)?$').match(uri)
print m.groupdict()
{'host': 'host', 'password': 'password', 'db': 'database', 'user': 'username', 'port': None}
This works fine. 这很好。 The problem is if the uri has a @ symbol in it, since that's used to split password and host.
问题是uri中是否有@符号,因为它用于拆分密码和主机。 For example,
例如,
uri = 'username:p@ssword@host/database'
will not match, which is expected. 将不匹配,这是预期的。 However, I'd like to be able to escape the special character, eg:
但是,我希望能够转义特殊字符,例如:
uri = 'username:p\@ssword@host/database'
and have it match. 并使其匹配。 My regex experience is pretty limited - I guess what I'd like to do is modify the
我的正则表达式经验非常有限-我想我想做的就是修改
(?P<password>[^@]*)
group so that it will match any character that's not a @, unless it's preceded by a \\ character. 组,以便它可以匹配不是@的任何字符,除非它前面带有\\字符。 Of course, some (most) connection strings will not contain a \\@ at all.
当然,某些(大多数)连接字符串将根本不包含\\ @。
Any help much appreciated. 任何帮助,不胜感激。
You could do: 您可以这样做:
(?P<password>([^\\@]|\\.)*)
This scans through your string and matches either: a non- \\
or non- @
, OR a backslash in which case it matches whatever follows too. 这将扫描您的字符串并匹配:非
\\
或非@
,或者匹配一个反斜杠(在这种情况下,它也匹配后面的内容)。 The only way an '@' can be matched by that regex is if it sneaks in through the \\\\.
该正则表达式可以匹配“ @”的唯一方法是它通过
\\\\.
潜入\\\\.
regex, ie it's escaped. 正则表达式,即逃脱了。
As an aside, to write regex in python, use r"insert_regex_here". 另外,要使用python编写正则表达式,请使用r“ insert_regex_here”。
Otherwise for a regex \\\\.
否则为正则表达式
\\\\.
, you have to write it in python like "\\\\\\\\."
,则必须使用
"\\\\\\\\."
类的python语言编写"\\\\\\\\."
. 。 TO avoid that you can do
r"\\\\."
为了避免这种情况,您可以执行
r"\\\\."
. 。
I would recommend you to use re.split
: 我建议您使用
re.split
:
>>> print re.split(r"(?<!\\)@|/|:", r"username:password@host/database")
['username', 'password', 'host', 'database']
>>> print re.split(r"(?<!\\)@|/|:", r"username:p\@ssword@host/database")
['username', 'p\\@ssword', 'host', 'database']
My take is you want greedy matching, that is password is up the last @ and hostname is between last @ and first / 我的看法是,您想要贪婪的匹配,即密码位于最后一个@,而主机名位于最后一个@与/之间。
A simple way could be like this: 一个简单的方法可能是这样的:
In [68]: re.match('((?P<user>.*):)((?P<pass>.*)@)((?P<host>.*)/)((?P<db>.*))', "username:p@ssword@host/data").groupdict()
Out[68]: {'db': 'data', 'host': 'host', 'pass': 'p@ssword', 'user': 'username'}
You might want to add optionals, that is (stuff)+ if eg username and password can be omitted. 如果可能省略用户名和密码,则可能要添加可选选项,即(stuff)+。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.