如何使用一个正则表达式解析Python中的此字符串

Question

I need to parse this string, with only one regular expression in Python. 我需要使用Python中的一个正则表达式来解析此字符串。 For every group I need to save the value in a specific field. 对于每个组，我需要将值保存在特定字段中。 The problem is that one or more of the parameters may be missing or be in a different order. 问题在于一个或多个参数可能丢失或处于不同顺序。 (ie domain 66666 ip nonce , with the middle part missing) （即domain 66666 ip nonce ，中间部分缺失）

3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h

I need to assign: 我需要分配：

time=2013-02-10T06:45:30.666821+00:00 (constant format) time=2013-02-10T06:45:30.666821+00:00 （恒定格式）
domain=domain (string) domain=domain （字符串）
code=66666 (integer) code=66666 （整数）
ip=127.0.0.1 (string) ip=127.0.0.1 （字符串）
pubvalue=kjiduensofksidoposiw (string of fixed length) pubvalue=kjiduensofksidoposiw （固定长度的字符串）
nonce=7896089hujoiuhiuh098h (string) nonce=7896089hujoiuhiuh098h （字串）

EDIT 编辑

This is an example on how the string can vary: 123dsf 2014-01-11T06:49:30.666821+00:00 google constant 12356 sync:[192.168.0.1] Request: pubvalue=fggggggeesidoposiw&nonce=7896089hujoiuhiuh098h 这是一个有关字符串如何变化的示例： 123dsf 2014-01-11T06：49：30.666821 + 00：00 google常量12356同步：[192.168.0.1]请求：pubvalue = fggggggeesidoposiw＆nonce = 7896089hujoiuhiuh098h

Thank you in advance for showing me the way. 预先感谢您为我提供帮助。

Answer 1

It's probably not a good idea to use one regex to parse the whole string. 使用一个正则表达式解析整个字符串可能不是一个好主意。 but I think the solution is to use named groups (see: Named groups on Regex Tutorial . Named groups can be captured by (?P<nameofgroup>bla) 但是我认为解决方案是使用named groups （请参阅： Regex教程上的 Named groups 。 Named groups可以由(?P<nameofgroup>bla)捕获。

So you can match for example the ip with: 因此，您可以将ip与以下内容匹配：

import re
str = "3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h"
print re.search("\[(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]", str).groupdict()

Just extend this Regular expression with the patterns you need to match the other stuff. 只需使用需要匹配其他内容的模式扩展此正则表达式即可。

and you can make the groups optional with placing a ? 并且您可以通过放置?来使组可选? after the group's parantheses, like: (?P<ip>pattern)? 在小组的偏瘫之后，例如：（ (?P<ip>pattern)? . 。 If a pattern could not be matched the element in the dict will be None . 如果无法匹配模式，则dict中的元素将为None 。

But notice: It is not a good idea to do this in only one Regex. 但请注意：仅在一个 Regex中执行此操作不是一个好主意。 It will be slow (because of backtracking and stuff) and the Regex will be long and complicated to maintain! 它将很慢（由于回溯和填充），并且正则表达式的维护时间长而复杂！

如何使用一个正则表达式解析Python中的此字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-02-21 09:32:07

如何使用一个正则表达式解析Python中的此字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-02-21 09:32:07

解决方案1
1 已采纳 2013-02-21 09:32:07