简体   繁体   中英

Reversing C-style format strings in Python (`%`)

Introduction and setup

Suppose I have a 'template'* string of the form,

>>> template = """My %(pet)s ate my %(object)s.
... This is a float: %(number)0.2f.
... %(integer)10d is an integer on a newline."""

With this template I can generate a new string with,

>>> d = dict(pet='dog', object='homework', number=7.7375487, integer=743898,)

>>> out_string = template % d

>>> print(out_string)
My dog ate my homework.
This is a float: 7.74.
    743898 is an integer on a newline.

How nice!

Question

I'd like to apply template to out_string to produce a new dict . Something like,

>>> d_approx_copy = reverse_cstyle_template(out_string, template)
>>> print(d_approx_copy)
{pet='dog', object='homework', number=7.74, integer=743898,}

Is there a Pythonic way to do this? Does an implementation already exist?**

Notes

*: I'm not using Template because, AFAIK, they don't currently support reversing .

**: I am aware of the risks associated with the loss of precision in number (from 7.7375487 to 7.74 ). I can deal with that. I'm just looking for a simple way to do this.

As I was developing this question, I could not find an existing tool to reverse C-style strings this way. That is, I think the answer to this question is: the reverse_cstyle_template function I was looking for does not currently exist.

In the process of researching this topic, I found many questions/answers similar to this one that use regular expressions (eg 1 , 2 , 3 ). However, I wanted something simpler and I did not want to have to use a different template string for formatting vs. parsing.

This eventually led me to format string syntax , and Richard Jones ' parse package. For example the template above is written in format string syntax as,

>>> template = """My {pet} ate my {object}.
... This is a float: {number:0.2f}.
... {integer:10d} is an integer on a newline."""

With this template, one can use the built-in str.format to create a new string based on d ,

template.format(**d)

Then use the parse package to get d_approx_copy ,

>>> from parse import parse
>>> d_approx_copy = parse(template, out_string).named

Note here that I've accessed the .named attribute. This is because parse returns a Result object (defined in parse) that captures both named and fixed format specifiers. For example if one uses,

>>> template = """My {pet} {}ate my {object}.
... This is a float: {number:0.2f}.
... {integer:10d} is an integer on a newline.
... Here is another 'fixed' input: {}"""

>>> out_string = template.format('spot ', 7, **d)

>>> print(out_string)
My dog spot ate my homework.
This is a float: 7.74.
    743898 is an integer on a newline.
Here is another 'fixed' input: 7

Then we can get the fixed and named data back by,

>>> data = parse.parse(template, out_string)

>>> print(data.named)
{'pet': 'dog', 'integer': 743898, 'object': 'homework', 'number': 7.74}

>>> print(data.fixed)
('spot ', '7')

Cool, right?!

Hopefully someday this functionality will be included as a built-in either in str , or in Template . For now though parse works well for my purposes.

Lastly, I think it's important to re-emphasize the loss of precision that occurs through these steps when specifying precision in the format specifier (ie 7.7375487 becomes 7.74 )! In general using the precision specifier is probably a bad idea except when creating 'readable' strings (eg for 'summary' file output) that are not meant for further processing (ie will never be parsed ). This, of course, negates the point of this Q/A but needs to be mentioned here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM