I'm using Celery to run some background tasks. One of the tasks returns a python class I created. I want to use json to serialize and deserialize this class, given the warnings about using pickle.
Is there a simple built in way to achieve this?
The class is very simple, it contains 3 attributes all of which are lists of named tuples. It contains a couple of methods that performs some calculations on the attributes.
My idea is to serialize/deserialize the 3 attributes, since that defines the class.
This is my idea for the encoder, but I'm not sure how to decode the data again?
import json
class JSONSerializable(object):
def __repr__(self):
return json.dumps(self.__dict__)
class MySimpleClass(JSONSerializable):
def __init__(self, p1, p2, p3): # I only care about p1, p2, p3
self.p1 = p1
self.p2 = p2
self.p3 = p2
self.abc = p1 + p2 + p2
def some_calc(self):
...
First but not least important: the warnings against pickle are mainly if you could have 3rd partis injecting pickled data on your worker stream. If you are certain your own system is creating all pickled data to be consumed, there is no security problem at all. And as for compatibility, it is relatively easy to handle, and automatic if you are on the same Python version for produers and consumers of your Pickle files.
That said, for JSON, you have to create a subclass of Python's json.JSONEncoder
and json.JSONDecoder
- each of which will need to be passed as the cls
argument to all your json.dump(s)
and json.load(s)
calls.
A suggestion is that the default
method on the encoder encodes the class __module__
, its __name__
and a identifier key, say __custom__
to ensure it should be custom decoded, as keys to a dictionary, and the object's data as a "data" key.
And on the encoder, you check for the __custom__
key, and them instantiate a class using the __new__
method, and populate its dict. Like for pickle, side-effects that are triggered on the class __init__
won't run.
You can later on enhance your decoder and encoder so that, for example, they search the class for a __json_encode__
method that could handle only the desired attributes.
Sample implementation:
import json
class GenericJSONEncoder(json.JSONEncoder):
def default(self, obj):
try:
return super().default(obj)
except TypeError:
pass
cls = type(obj)
result = {
'__custom__': True,
'__module__': cls.__module__,
'__name__': cls.__name__,
'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
}
return result
class GenericJSONDecoder(json.JSONDecoder):
def decode(self, str):
result = super().decode(str)
if not isinstance(result, dict) or not result.get('__custom__', False):
return result
import sys
module = result['__module__']
if not module in sys.modules:
__import__(module)
cls = getattr(sys.modules[module], result['__name__'])
if hasattr(cls, '__json_decode__'):
return cls.__json_decode__(result['data'])
instance = cls.__new__(cls)
instance.__dict__.update(result['data'])
return instance
Interactive test on the console:
In [36]: class A:
...: def __init__(self, a):
...: self.a = a
...:
In [37]: a = A('test')
In [38]: b = json.loads(json.dumps(a, cls=GenericJSONEncoder), cls=GenericJSONDecoder)
In [39]: b.a
Out[39]: 'test'
Here is an improved version of the great solution provided by @jsbueno which also works with nested custom types.
import json
import collections
import six
def is_iterable(arg):
return isinstance(arg, collections.Iterable) and not isinstance(arg, six.string_types)
class GenericJSONEncoder(json.JSONEncoder):
def default(self, obj):
try:
return super().default(obj)
except TypeError:
pass
cls = type(obj)
result = {
'__custom__': True,
'__module__': cls.__module__,
'__name__': cls.__name__,
'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
}
return result
class GenericJSONDecoder(json.JSONDecoder):
def decode(self, str):
result = super().decode(str)
return GenericJSONDecoder.instantiate_object(result)
@staticmethod
def instantiate_object(result):
if not isinstance(result, dict): # or
if is_iterable(result):
return [GenericJSONDecoder.instantiate_object(v) for v in result]
else:
return result
if not result.get('__custom__', False):
return {k: GenericJSONDecoder.instantiate_object(v) for k, v in result.items()}
import sys
module = result['__module__']
if module not in sys.modules:
__import__(module)
cls = getattr(sys.modules[module], result['__name__'])
if hasattr(cls, '__json_decode__'):
return cls.__json_decode__(result['data'])
instance = cls.__new__(cls)
data = {k: GenericJSONDecoder.instantiate_object(v) for k, v in result['data'].items()}
instance.__dict__.update(data)
return instance
class C:
def __init__(self):
self.c = 133
def __repr__(self):
return "C<" + str(self.__dict__) + ">"
class B:
def __init__(self):
self.b = {'int': 123, "c": C()}
self.l = [123, C()]
self.t = (234, C())
self.s = "Blah"
def __repr__(self):
return "B<" + str(self.__dict__) + ">"
class A:
class_y = 13
def __init__(self):
self.x = B()
def __repr__(self):
return "A<" + str(self.__dict__) + ">"
def dumps(obj, *args, **kwargs):
return json.dumps(obj, *args, cls=GenericJSONEncoder, **kwargs)
def dump(obj, *args, **kwargs):
return json.dump(obj, *args, cls=GenericJSONEncoder, **kwargs)
def loads(obj, *args, **kwargs):
return json.loads(obj, *args, cls=GenericJSONDecoder, **kwargs)
def load(obj, *args, **kwargs):
return json.load(obj, *args, cls=GenericJSONDecoder, **kwargs)
Check it out:
e = dumps(A())
print("ENCODED:\n\n", e)
b = json.loads(e, cls=GenericJSONDecoder)
b = loads(e)
print("\nDECODED:\n\n", b)
Prints:
A<{'x': B<{'b': {'int': 123, 'c': C<{'c': 133}>}, 'l': [123, C<{'c': 133}>], 't': [234, C<{'c': 133}>], 's': 'Blah'}>}>
The original version only reconstructs the A
correctly while all instances of B
and C
are not instantiated but left as dicts:
A<{'x': {'__custom__': True, '__module__': '__main__', '__name__': 'B', 'data': {'b': {'int': 123, 'c': {'__custom__': True, '__module__': '__main__', '__name__': 'C', 'data': {'c': 133}}}, 'l': [123, {'__custom__': True, '__module__': '__main__', '__name__': 'C', 'data': {'c': 133}}], 't': [234, {'__custom__': True, '__module__': '__main__', '__name__': 'C', 'data': {'c': 133}}], 's': 'Blah'}}}>
Note that if the type contains an collection like list or tuple, the actual type of the collection can not be restored during decoding. This is because all those collections will be converted into lists when encoded to json.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.