[英]How to parse with one regular expression this string in Python
I need to parse this string, with only one regular expression in Python. 我需要使用Python中的一个正则表达式来解析此字符串。 For every group I need to save the value in a specific field. 对于每个组,我需要将值保存在特定字段中。 The problem is that one or more of the parameters may be missing or be in a different order. 问题在于一个或多个参数可能丢失或处于不同顺序。 (ie domain 66666 ip nonce
, with the middle part missing) (即domain 66666 ip nonce
,中间部分缺失)
3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h
I need to assign: 我需要分配:
time=2013-02-10T06:45:30.666821+00:00
(constant format) time=2013-02-10T06:45:30.666821+00:00
(恒定格式) domain=domain
(string) domain=domain
(字符串) code=66666
(integer) code=66666
(整数) ip=127.0.0.1
(string) ip=127.0.0.1
(字符串) pubvalue=kjiduensofksidoposiw
(string of fixed length) pubvalue=kjiduensofksidoposiw
(固定长度的字符串) nonce=7896089hujoiuhiuh098h
(string) nonce=7896089hujoiuhiuh098h
(字串) EDIT 编辑
This is an example on how the string can vary: 123dsf 2014-01-11T06:49:30.666821+00:00 google constant 12356 sync:[192.168.0.1] Request: pubvalue=fggggggeesidoposiw&nonce=7896089hujoiuhiuh098h 这是一个有关字符串如何变化的示例: 123dsf 2014-01-11T06:49:30.666821 + 00:00 google常量12356同步:[192.168.0.1]请求:pubvalue = fggggggeesidoposiw&nonce = 7896089hujoiuhiuh098h
Thank you in advance for showing me the way. 预先感谢您为我提供帮助。
It's probably not a good idea to use one regex to parse the whole string. 使用一个正则表达式解析整个字符串可能不是一个好主意。 but I think the solution is to use named groups
(see: Named groups on Regex Tutorial . Named groups
can be captured by (?P<nameofgroup>bla)
但是我认为解决方案是使用named groups
(请参阅: Regex教程上的 Named groups
。 Named groups
可以由(?P<nameofgroup>bla)
捕获。
So you can match for example the ip with: 因此,您可以将ip与以下内容匹配:
import re
str = "3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h"
print re.search("\[(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]", str).groupdict()
Just extend this Regular expression with the patterns you need to match the other stuff. 只需使用需要匹配其他内容的模式扩展此正则表达式即可。
and you can make the groups optional with placing a ?
并且您可以通过放置?
来使组可选?
after the group's parantheses, like: (?P<ip>pattern)?
在小组的偏瘫之后,例如:( (?P<ip>pattern)?
. 。 If a pattern could not be matched the element in the dict will be None
. 如果无法匹配模式,则dict中的元素将为None
。
But notice: It is not a good idea to do this in only one Regex. 但请注意:仅在一个 Regex中执行此操作不是一个好主意。 It will be slow (because of backtracking and stuff) and the Regex will be long and complicated to maintain! 它将很慢(由于回溯和填充),并且正则表达式的维护时间长而复杂!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.