简体   繁体   中英

Parsing from format in Python

Is there any way in Python to reverse the formating operation done through the "%" operator ?

formated = "%d ooo%s" % (12, "ps")
#formated is now '12 ooops'
(arg1, arg2) = theFunctionImSeeking("12 ooops", "%d ooo%s")
#arg1 is 12 and arg2 is "ps"

EDIT Regexp can be a solution for that but they are harder to write and I suspect them to be slower since they can handle more complex structures. I would really like an equivalent to sscanf.

Use regular expressions ( re module):

>>> import re
>>> match = re.search('(\d+) ooo(\w+)', '12 ooops')
>>> match.group(1), match.group(2)
('12', 'ps')

Regular expressions is as near as you can get to do what you want. There is no way to do it using the same format string ( '%d ooo%s' ).

EDIT: As @Daenyth suggested, you could implement your own function with this behaviour:

import re

def python_scanf(my_str, pattern):
    D = ('%d',      '(\d+?)')
    F = ('%f', '(\d+\.\d+?)')
    S = ('%s',       '(.+?)')
    re_pattern = pattern.replace(*D).replace(*F).replace(*S)
    match = re.match(re_pattern, my_str)
    if match:
        return match.groups()
    raise ValueError("String doesn't match pattern")

Usage:

>>> python_scanf("12 ooops", "%d ooo%s")
('12', 'p')
>>> python_scanf("12 ooops", "%d uuu%s")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in python_scanf
ValueError: String doesn't match pattern

Of course, python_scanf won't work with more complex patterns like %.4f or %r .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM