Python正则表达式-转义

Question

I have a Python regex that takes a string (database connection URI) and splits it using named groups into username, password etc. 我有一个Python正则表达式，它接受一个字符串（数据库连接URI），并使用命名组将其拆分为用户名，密码等。

uri = 'username:password@host/database'
m = re.compile('^(?P<user>[^:@]+)(\:(?P<password>[^@]*))?@(?P<host>[^\:@/]+)(\:(?P<port>[0-9]+))?/(?P<db>[^\?]+)?$').match(uri)
print m.groupdict()
{'host': 'host', 'password': 'password', 'db': 'database', 'user': 'username', 'port': None}

This works fine. 这很好。 The problem is if the uri has a @ symbol in it, since that's used to split password and host. 问题是uri中是否有@符号，因为它用于拆分密码和主机。 For example, 例如，

uri = 'username:p@ssword@host/database'

will not match, which is expected. 将不匹配，这是预期的。 However, I'd like to be able to escape the special character, eg: 但是，我希望能够转义特殊字符，例如：

uri = 'username:p\@ssword@host/database'

and have it match. 并使其匹配。 My regex experience is pretty limited - I guess what I'd like to do is modify the 我的正则表达式经验非常有限-我想我想做的就是修改

(?P<password>[^@]*)

group so that it will match any character that's not a @, unless it's preceded by a \\ character. 组，以便它可以匹配不是@的任何字符，除非它前面带有\\字符。 Of course, some (most) connection strings will not contain a \\@ at all. 当然，某些（大多数）连接字符串将根本不包含\\ @。

Any help much appreciated. 任何帮助，不胜感激。

Answer 1

You could do: 您可以这样做：

(?P<password>([^\\@]|\\.)*)

This scans through your string and matches either: a non- \\ or non- @ , OR a backslash in which case it matches whatever follows too. 这将扫描您的字符串并匹配：非\\或非@ ，或者匹配一个反斜杠（在这种情况下，它也匹配后面的内容）。 The only way an '@' can be matched by that regex is if it sneaks in through the \\\\. 该正则表达式可以匹配“ @”的唯一方法是它通过\\\\.潜入\\\\. regex, ie it's escaped. 正则表达式，即逃脱了。

As an aside, to write regex in python, use r"insert_regex_here". 另外，要使用python编写正则表达式，请使用r“ insert_regex_here”。

Otherwise for a regex \\\\. 否则为正则表达式\\\\. , you have to write it in python like "\\\\\\\\." ，则必须使用"\\\\\\\\."类的python语言编写"\\\\\\\\." . 。 TO avoid that you can do r"\\\\." 为了避免这种情况，您可以执行r"\\\\." . 。

Answer 2

I would recommend you to use re.split : 我建议您使用re.split ：

>>> print re.split(r"(?<!\\)@|/|:", r"username:password@host/database")
['username', 'password', 'host', 'database']
>>> print re.split(r"(?<!\\)@|/|:", r"username:p\@ssword@host/database")
['username', 'p\\@ssword', 'host', 'database']

Answer 3

My take is you want greedy matching, that is password is up the last @ and hostname is between last @ and first / 我的看法是，您想要贪婪的匹配，即密码位于最后一个@，而主机名位于最后一个@与/之间。

A simple way could be like this: 一个简单的方法可能是这样的：

In [68]: re.match('((?P<user>.*):)((?P<pass>.*)@)((?P<host>.*)/)((?P<db>.*))', "username:p@ssword@host/data").groupdict()
Out[68]: {'db': 'data', 'host': 'host', 'pass': 'p@ssword', 'user': 'username'}

You might want to add optionals, that is (stuff)+ if eg username and password can be omitted. 如果可能省略用户名和密码，则可能要添加可选选项，即（stuff）+。

Python正则表达式-转义

问题描述

3 个解决方案

解决方案1
0 2012-02-06 06:23:21

解决方案2
0 2012-02-06 06:26:53

解决方案3
0 已采纳 2012-02-06 12:43:59

Python正则表达式-转义

问题描述

3 个解决方案

解决方案1 0 2012-02-06 06:23:21

解决方案2 0 2012-02-06 06:26:53

解决方案3 0 已采纳 2012-02-06 12:43:59

解决方案1
0 2012-02-06 06:23:21

解决方案2
0 2012-02-06 06:26:53

解决方案3
0 已采纳 2012-02-06 12:43:59