The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)
only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle , which however comes with a prominent security warning.
EDIT:
Answers have suggested the following libraries:
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result = []
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result = []
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
All it takes is a five-liner:
def dataclass_from_dict(klass, d):
try:
fieldtypes = {f.name:f.type for f in dataclasses.fields(klass)}
return klass(**{f:dataclass_from_dict(fieldtypes[f],d[f]) for f in d})
except:
return d # Not a dataclass field
Sample usage:
from dataclasses import dataclass, asdict
@dataclass
class Point:
x: float
y: float
@dataclass
class Line:
a: Point
b: Point
line = Line(Point(1,2), Point(3,4))
assert line == dataclass_from_dict(Line, asdict(line))
Full code, including to/from json, here at gist: https://gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22
Using no additional modules, you can make use of the __post_init__
function to automatically convert the dict
values to the correct type. This function is called after __init__
.
from dataclasses import dataclass, asdict
@dataclass
class Bar:
fee: str
far: str
@dataclass
class Foo:
bar: Bar
def __post_init__(self):
if isinstance(self.bar, dict):
self.bar = Bar(**self.bar)
foo = Foo(bar=Bar(fee="La", far="So"))
d= asdict(foo)
print(d) # {'bar': {'fee': 'La', 'far': 'So'}}
o = Foo(**d)
print(o) # Foo(bar=Bar(fee='La', far='So'))
This solution has the added benefit of being able to use non-dataclass objects. As long as its str
function can be converted back, it's fair game. For example, it can be used to keep str
fields as IP4Address
internally.
You can use mashumaro for creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
If your goal is to produce JSON from and to existing, predefined dataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSON a (safe) reference to the original dataclass.
jsonpickle
is not safe because it stores references to arbitrary Python objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances :
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hints to name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
A possible solution that I haven't seen mentioned yet is to use dataclasses-json
. This library provides conversions of dataclass
instances to/from JSON, but also to/from dict
(like dacite
and mashumaro
, which were suggested in earlier answers).
dataclasses-json
requires decorating the classes with @dataclass_json
in addition to @dataclass
. The decorated classes then get a couple of member functions for conversions to/from JSON and to/from dict
:
from_dict(...)
from_json(...)
to_dict(...)
to_json(...)
Here is a slightly modified version of the original code in the question. I've added the required @dataclass_json
decorators and assert
s for the conversion from dict
s to instances of Point
and C
:
from dataclasses import dataclass, asdict
from dataclasses_json import dataclass_json
from typing import List
@dataclass_json
@dataclass
class Point:
x: int
y: int
@dataclass_json
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
assert p == Point.from_dict({'x': 10, 'y': 20})
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
assert c == C.from_dict(tmp)
Simple solution that supports lists as well (and can be extended for other generic uses)
from dataclasses import dataclass, asdict, fields, is_dataclass
from typing import List
from types import GenericAlias
def asdataclass(klass, d):
if not is_dataclass(klass):
return d
values = {}
for f in fields(klass):
if isinstance(f.type, GenericAlias) and f.type.__origin__ == list:
values[f.name] = [asdataclass(f.type.__args__[0], d2) for d2 in d[f.name]]
else:
values[f.name] = asdataclass(f.type,d[f.name])
return klass(**values)
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: list[Point]
title: str = ""
c = C([Point(0, 0), Point(10, 4)])
assert c == asdataclass(C, asdict(c))
I know there's probably tons of JSON serialization libraries out there by now actually, and to be honest I might have stumbled upon this article a bit late. However, a newer (and well-tested) option also available is the dataclass-wizard
library. This has recently (as of two weeks ago in any case) moved to the Production/Stable status as of the v0.18.0 release.
It has pretty solid support for typing generics from the typing
module, as well as other niche use cases such as dataclasses in Union
types and patterned dates and times. Other nice-to-have features that I have personally found quite useful, such as auto key casing transforms (ie camel to snake) and implicit type casts (ie string to annotated int
) are implemented as well.
Ideal usage is with the JSONWizard
Mixin class, which provides useful class methods such as:
from_json
from_dict
/ from_list
to_dict
to_json
/ list_to_json
Here's a pretty self-explanatory usage that has been tested in Python 3.7+ with the included __future__
import:
from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import JSONWizard
@dataclass
class C(JSONWizard):
my_list: list[Point]
@dataclass
class Point(JSONWizard):
x: int
y: int
# Serialize Point instance
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
# default case transform is 'camelCase', though this can be overridden
# with a custom Meta config supplied for the main dataclass.
tmp = {'myList': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
NB : It's worth noting that, technically, you only need to sub-class the main dataclass, ie the model being serialized; the nested dataclasses can be left alone if desired.
If a class inheritance model is not desired altogether, the other option is to use exported helper functions such as fromdict
, asdict
to convert dataclass instance to/from Python dict
objects as needed.
undictify is a library which could be of help. Here is a minimal usage example:
import json
from dataclasses import dataclass
from typing import List, NamedTuple, Optional, Any
from undictify import type_checked_constructor
@type_checked_constructor(skip=True)
@dataclass
class Heart:
weight_in_kg: float
pulse_at_rest: int
@type_checked_constructor(skip=True)
@dataclass
class Human:
id: int
name: str
nick: Optional[str]
heart: Heart
friend_ids: List[int]
tobias_dict = json.loads('''
{
"id": 1,
"name": "Tobias",
"heart": {
"weight_in_kg": 0.31,
"pulse_at_rest": 52
},
"friend_ids": [2, 3, 4, 5]
}''')
tobias = Human(**tobias_dict)
Validobj does just that. Compared to other libraries, it provides a simpler interface (just one function at the moment) and emphasizes informative error messages. For example, given a schema like
import dataclasses
from typing import Optional, List
@dataclasses.dataclass
class User:
name: str
phone: Optional[str] = None
tasks: List[str] = dataclasses.field(default_factory=list)
One gets an error like
>>> import validobj
>>> validobj.parse_input({
... 'phone': '555-1337-000', 'address': 'Somewhereville', 'nme': 'Zahari'}, User
... )
Traceback (most recent call last):
...
WrongKeysError: Cannot process value into 'User' because fields do not match.
The following required keys are missing: {'name'}. The following keys are unknown: {'nme', 'address'}.
Alternatives to invalid value 'nme' include:
- name
All valid options are:
- name
- phone
- tasks
for a typo on a given field.
I would like to suggest using the Composite Pattern to solve this, the main advantage is that you could continue adding classes to this pattern and have them behave the same way.
from dataclasses import dataclass
from typing import List
@dataclass
class CompositeDict:
def as_dict(self):
retval = dict()
for key, value in self.__dict__.items():
if key in self.__dataclass_fields__.keys():
if type(value) is list:
retval[key] = [item.as_dict() for item in value]
else:
retval[key] = value
return retval
@dataclass
class Point(CompositeDict):
x: int
y: int
@dataclass
class C(CompositeDict):
mylist: List[Point]
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.as_dict() == tmp
as a side note, you could employ a factory pattern within the CompositeDict class that would handle other cases like nested dicts, tuples and such, which would save much boilerplate.
from validated_dc import ValidatedDC
from dataclasses import dataclass
from typing import List, Union
@dataclass
class Foo(ValidatedDC):
foo: int
@dataclass
class Bar(ValidatedDC):
bar: Union[Foo, List[Foo]]
foo = {'foo': 1}
instance = Bar(bar=foo)
print(instance.get_errors()) # None
print(instance) # Bar(bar=Foo(foo=1))
list_foo = [{'foo': 1}, {'foo': 2}]
instance = Bar(bar=list_foo)
print(instance.get_errors()) # None
print(instance) # Bar(bar=[Foo(foo=1), Foo(foo=2)])
validated_dc:
https://github.com/EvgeniyBurdin/validated_dc
And see a more detailed example:
https://github.com/EvgeniyBurdin/validated_dc/blob/master/examples/detailed.py
I really think that concept presented by gatopeich in this answer is the best approach for this question.
I've fixed and civilized his code. This is a correct function to load dataclass back from dictionary:
def dataclass_from_dict(cls: type, src: t.Mapping[str, t.Any]) -> t.Any:
field_types_lookup = {
field.name: field.type
for field in dataclasses.fields(cls)
}
constructor_inputs = {}
for field_name, value in src.items():
try:
constructor_inputs[field_name] = dataclass_from_dict(field_types_lookup[field_name], value)
except TypeError as e:
# type error from fields() call in recursive call
# indicates that field is not a dataclass, this is how we are
# breaking the recursion. If not a dataclass - no need for loading
constructor_inputs[field_name] = value
except KeyError:
# similar, field not defined on dataclass, pass as plain field value
constructor_inputs[field_name] = value
return cls(**constructor_inputs)
Then you can test with following:
@dataclass
class Point:
x: float
y: float
@dataclass
class Line:
a: Point
b: Point
p1, p2 = Point(1,1), Point(2,2)
line = Line(p1, p1)
assert line == dataclass_from_dict(Line, asdict(line))
from dataclasses import dataclass, is_dataclass
@dataclass
class test2:
a: str = 'name'
b: int = 222
@dataclass
class test:
a: str = 'name'
b: int = 222
t: test2 = None
a = test(a = 2222222222, t=test2(a="ssss"))
print(a)
def dataclass_from_dict(schema: any, data: dict):
data_updated = {
key: (
data[key]
if not is_dataclass(schema.__annotations__[key])
else dataclass_from_dict(schema.__annotations__[key], data[key])
)
for key in data.keys()
}
return schema(**data_updated)
print(dataclass_from_dict(test, {'a': 1111111, 't': {'a': 'nazwa'} }))
A possible alternative might be a lightweight chili library:
from dataclasses import dataclass, asdict
from typing import List
from chili import init_dataclass
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
assert c == init_dataclass(tmp, C)
Chili supports almost the entire typing module, including custom Generic types. You can read more here: https://github.com/kodemore/chili
Installation can be done through pip or poetry by simply running:
pip install chili
or
poetry add chili
It has only one dependency, which is typing extensions.
Adding an alternative option - convtools models ( docs / github ).
The vision of this library is:
It also does it's best to allow for automated error processing ( link ).
1. Validation only
from typing import List
from convtools.contrib.models import DictModel, build
class Point(DictModel):
x: int
y: int
class C(DictModel):
mylist: List[Point]
point, errors = build(Point, {"x": 10, "y": 20})
"""
>>> In [2]: point
>>> Out[2]: Point(x=10, y=20)
>>>
>>> In [3]: point.to_dict()
>>> Out[3]: {'x': 10, 'y': 20}
"""
obj, errors = build(C, {"mylist": [{"x": 0, "y": 0}, {"x": 10, "y": 4}]})
"""
>>> In [8]: obj
>>> Out[8]: C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>>
>>> In [9]: obj.to_dict()
>>> Out[9]: {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
"""
2. Type casting
from convtools.contrib.models import cast
class Point(DictModel):
# no type casting here
x: int
# casting to int
# - when run with no args it infers caster from output type
# - OR you can pass built-in/custom casters, e.g. casters.IntLossy()
y: int = cast()
class Point(DictModel):
x: int
y: int
class Meta:
# forces all fields to be cast to expected types
cast = True
# # JIC: to override automatic caster inference:
# cast_overrides = {
# date: casters.DateFromStr("%m/%d/%Y")
# }
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.