简体   繁体   中英

How to parse with one regular expression this string in Python

I need to parse this string, with only one regular expression in Python. For every group I need to save the value in a specific field. The problem is that one or more of the parameters may be missing or be in a different order. (ie domain 66666 ip nonce , with the middle part missing)

3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h

I need to assign:

  • time=2013-02-10T06:45:30.666821+00:00 (constant format)
  • domain=domain (string)
  • code=66666 (integer)
  • ip=127.0.0.1 (string)
  • pubvalue=kjiduensofksidoposiw (string of fixed length)
  • nonce=7896089hujoiuhiuh098h (string)

EDIT

This is an example on how the string can vary: 123dsf 2014-01-11T06:49:30.666821+00:00 google constant 12356 sync:[192.168.0.1] Request: pubvalue=fggggggeesidoposiw&nonce=7896089hujoiuhiuh098h

Thank you in advance for showing me the way.

It's probably not a good idea to use one regex to parse the whole string. but I think the solution is to use named groups (see: Named groups on Regex Tutorial . Named groups can be captured by (?P<nameofgroup>bla)

So you can match for example the ip with:

import re
str = "3249dsf 2013-02-10T06:44:30.666821+00:00 domain constant 66666 sync:[127.0.0.1] Request: pubvalue=kjiduensofksidoposiw&change=09872534&value2=jdmcnhj&counter=232&value3=2&nonce=7896089hujoiuhiuh098h"
print re.search("\[(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]", str).groupdict()

Just extend this Regular expression with the patterns you need to match the other stuff.

and you can make the groups optional with placing a ? after the group's parantheses, like: (?P<ip>pattern)? . If a pattern could not be matched the element in the dict will be None .

But notice: It is not a good idea to do this in only one Regex. It will be slow (because of backtracking and stuff) and the Regex will be long and complicated to maintain!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM