简体   繁体   中英

Pretty-print dataclasses prettier

Python Data Classes instances also include a string representation method, but its result isn't really sufficient for pretty printing purposes when classes have more than a few fields and/or longer field values.

Basically I'm looking for a way to customize the default dataclasses string representation routine or for a pretty-printer that understands data classes and prints them prettier.

So, it's just a small customization I have in mind: adding a line break after each field while indenting lines after the first one.

For example, instead of

x = InventoryItem('foo', 23)
print(x) # =>
InventoryItem(name='foo', unit_price=23, quantity_on_hand=0)

I want to get a string representation like this:

x = InventoryItem('foo', 23)
print(x) # =>
InventoryItem(
    name='foo',
    unit_price=23,
    quantity_on_hand=0
)

Or similar. Perhaps a pretty-printer could get even fancier, such as aligning the = assignment characters or something like that.

Of course, it should also work in a recursive fashion, eg fields that are also dataclasses should be indented more.

As of 2021 (Python 3.9), the Python's standard pprint doesn't support dataclasses, yet.

However, the prettyprinter package supports dataclasses and provides some nice pretty-printing features.

Example:

[ins] In [1]: from dataclasses import dataclass
         ...:
         ...: @dataclass
         ...: class Point:
         ...:     x: int
         ...:     y: int
         ...:
         ...: @dataclass
         ...: class Coords:
         ...:     my_points: list
         ...:     my_dict: dict
         ...:
         ...: coords = Coords([Point(1, 2), Point(3, 4)], {'a': (1, 2), (1, 2): 'a'})

[nav] In [2]: import prettyprinter as pp

[ins] In [3]: pp.pprint(coords)
Coords(my_points=[Point(x=1, y=2), Point(x=3, y=4)], my_dict={'a': (1, 2), (1, 2): 'a'})

The dataclasses support isn't enabled, by default, thus:

[nav] In [4]: pp.install_extras()
[ins] In [5]: pp.pprint(coords)
Coords(
    my_points=[Point(x=1, y=2), Point(x=3, y=4)],
    my_dict={'a': (1, 2), (1, 2): 'a'}
)

Or to force indenting of all fields:

[ins] In [6]: pp.pprint(coords, width=1)
Coords(
    my_points=[
        Point(
            x=1,
            y=2
        ),
        Point(
            x=3,
            y=4
        )
    ],
    my_dict={
        'a': (
            1,
            2
        ),
        (
            1,
            2
        ): 'a'
    }
)

Prettyprinter can even syntax-highlight. (cf. cpprint() )


Considerations:

Python 3.10+ Supports pretty printing dataclasses:

Python 3.10.0b2+ (heads/3.10:f807a4fad4, Sep  4 2021, 18:58:04) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> @dataclass
... class Literal:
...     value: 'Any'
... 
>>> @dataclass
... class Binary:
...     left: 'Binary | Literal'
...     operator: str
...     right: 'Binary | Literal'
... 
>>> from pprint import pprint
>>> # magic happens here
>>> pprint(
... Binary(Binary(Literal(2), '*', Literal(100)), '+', Literal(50)))
Binary(left=Binary(left=Literal(value=2),
                   operator='*',
                   right=Literal(value=100)),
       operator='+',
       right=Literal(value=50))

Sadly, it is not in wide use (...yet, as of 2021)

We can use dataclasses.fields to recurse through nested dataclasses and pretty print them:

from collections.abc import Mapping, Iterable
from dataclasses import is_dataclass, fields

def pretty_print(obj, indent=4):
    """
    Pretty prints a (possibly deeply-nested) dataclass.
    Each new block will be indented by `indent` spaces (default is 4).
    """
    print(stringify(obj, indent))

def stringify(obj, indent=4, _indents=0):
    if isinstance(obj, str):
        return f"'{obj}'"

    if not is_dataclass(obj) and not isinstance(obj, (Mapping, Iterable)):
        return str(obj)

    this_indent = indent * _indents * ' '
    next_indent = indent * (_indents + 1) * ' '
    start, end = f'{type(obj).__name__}(', ')'  # dicts, lists, and tuples will re-assign this

    if is_dataclass(obj):
        body = '\n'.join(
            f'{next_indent}{field.name}='
            f'{stringify(getattr(obj, field.name), indent, _indents + 1)},' for field in fields(obj)
        )

    elif isinstance(obj, Mapping):
        if isinstance(obj, dict):
            start, end = '{}'

        body = '\n'.join(
            f'{next_indent}{stringify(key, indent, _indents + 1)}: '
            f'{stringify(value, indent, _indents + 1)},' for key, value in obj.items()
        )

    else:  # is Iterable
        if isinstance(obj, list):
            start, end = '[]'
        elif isinstance(obj, tuple):
            start = '('

        body = '\n'.join(
            f'{next_indent}{stringify(item, indent, _indents + 1)},' for item in obj
        )

    return f'{start}\n{body}\n{this_indent}{end}'

We can test it with a nested dataclass:

from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int

@dataclass
class Coords:
    my_points: list
    my_dict: dict

coords = Coords([Point(1, 2), Point(3, 4)], {'a': (1, 2), (1, 2): 'a'})

pretty_print(coords)

# Coords(
#     my_points=[
#         Point(
#             x=1,
#             y=2,
#         ),
#         Point(
#             x=3,
#             y=4,
#         ),
#     ],
#     my_dict={
#         'a': (
#             1,
#             2,
#         ),
#         (
#             1,
#             2,
#         ): 'a',
#     },
# )

This should be general enough to cover most cases. Hope this helps!

You should probably use prettyprinter but if you can't add dependencies for some reason then you could use this, which is ever so slightly shorter than salt-die's example (because it uses pprint )

import dataclasses
import pprint


def dcppformat(x, chars=0):
    def parts():
        if dataclasses.is_dataclass(x):
            yield type(x).__name__ + "("

            def fields():
                for field in dataclasses.fields(x):
                    nindent = chars + len(field.name) + 4
                    value = getattr(x, field.name)
                    rep_value = dcppformat(value)
                    yield " " * (chars + 3) + indent_body_chars(
                        "{}={}".format(field.name, rep_value), chars=nindent
                    )

            yield ",\n".join(fields())
            yield " " * chars + ")"
        else:
            yield pprint.pformat(x)

    return "\n".join(parts())


def indent(x, level=1):
    indent_chars(x, level * 4)


def indent_chars(x, chars=1):
    return "\n".join(" " * chars + p for p in x.split("\n"))


def indent_body_chars(x, chars=4):
    a, *b = x.split("\n")
    if b:
        return a + "\n" + indent_chars("\n".join(b), chars=chars,)
    else:
        return a


def dcpprint(x):
    print(dcppformat(x))


def test():
    @dataclasses.dataclass
    class A:
        a: object
        b: str

    dcpprint(A(a=A(a=None, b={"a": 1, "c": 1, "long": "a" * 100}), b=2))


if __name__ == "__main__":
    test()

All I cared about was having the fields on separate lines, so I ended up using dataclasses.asdict along with pprint.pprint :

from dataclasses import dataclass, asdict
from pprint import pprint

@dataclass
class SomeClass:
   there: int
   needs: int
   to: int
   be: int
   many: int
   fields: int
   for_: int
   it: int
   to2: int
   work: int


a = SomeClass(there=1, needs=2, to=3, be=4, many=5, fields=6, for_=7, it=8, to2=9, work=10)

pprint(asdict(a), sort_dicts=False)

Output:

{'there': 1,
 'needs': 2,
 'to': 3,
 'be': 4,
 'many': 5,
 'fields': 6,
 'for_': 7,
 'it': 8,
 'to2': 9,
 'work': 10}

I was using Python 3.9.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM