简体   繁体   中英

Create a class that support json serialization for use with Celery

I'm using Celery to run some background tasks. One of the tasks returns a python class I created. I want to use json to serialize and deserialize this class, given the warnings about using pickle.

Is there a simple built in way to achieve this?

The class is very simple, it contains 3 attributes all of which are lists of named tuples. It contains a couple of methods that performs some calculations on the attributes.

My idea is to serialize/deserialize the 3 attributes, since that defines the class.

This is my idea for the encoder, but I'm not sure how to decode the data again?

import json

class JSONSerializable(object):
    def __repr__(self):
        return json.dumps(self.__dict__)

class MySimpleClass(JSONSerializable):
    def __init__(self, p1, p2, p3): # I only care about p1, p2, p3
        self.p1 = p1
        self.p2 = p2
        self.p3 = p2
        self.abc = p1 + p2 + p2

    def some_calc(self):
        ...

First but not least important: the warnings against pickle are mainly if you could have 3rd partis injecting pickled data on your worker stream. If you are certain your own system is creating all pickled data to be consumed, there is no security problem at all. And as for compatibility, it is relatively easy to handle, and automatic if you are on the same Python version for produers and consumers of your Pickle files.

That said, for JSON, you have to create a subclass of Python's json.JSONEncoder and json.JSONDecoder - each of which will need to be passed as the cls argument to all your json.dump(s) and json.load(s) calls.

A suggestion is that the default method on the encoder encodes the class __module__ , its __name__ and a identifier key, say __custom__ to ensure it should be custom decoded, as keys to a dictionary, and the object's data as a "data" key.

And on the encoder, you check for the __custom__ key, and them instantiate a class using the __new__ method, and populate its dict. Like for pickle, side-effects that are triggered on the class __init__ won't run.

You can later on enhance your decoder and encoder so that, for example, they search the class for a __json_encode__ method that could handle only the desired attributes.

Sample implementation:

import json

class GenericJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        try:
            return super().default(obj)
        except TypeError:
            pass
        cls = type(obj)
        result = {
            '__custom__': True,
            '__module__': cls.__module__,
            '__name__': cls.__name__,
            'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
        }
        return result


class GenericJSONDecoder(json.JSONDecoder):
    def decode(self, str):
        result = super().decode(str)
        if not isinstance(result, dict) or not result.get('__custom__', False):
            return result
        import sys
        module = result['__module__']
        if not module in sys.modules:
            __import__(module)
        cls = getattr(sys.modules[module], result['__name__'])
        if hasattr(cls, '__json_decode__'):
            return cls.__json_decode__(result['data'])
        instance = cls.__new__(cls)
        instance.__dict__.update(result['data'])
        return instance

Interactive test on the console:

In [36]: class A:
    ...:     def __init__(self, a):
    ...:         self.a = a
    ...:         

In [37]: a = A('test')

In [38]: b = json.loads(json.dumps(a, cls=GenericJSONEncoder),  cls=GenericJSONDecoder)

In [39]: b.a
Out[39]: 'test'

Here is an improved version of the great solution provided by @jsbueno which also works with nested custom types.

import json
import collections
import six

def is_iterable(arg):
    return isinstance(arg, collections.Iterable) and not isinstance(arg, six.string_types)


class GenericJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        try:
            return super().default(obj)
        except TypeError:
            pass
        cls = type(obj)
        result = {
            '__custom__': True,
            '__module__': cls.__module__,
            '__name__': cls.__name__,
            'data': obj.__dict__ if not hasattr(cls, '__json_encode__') else obj.__json_encode__
        }
        return result


class GenericJSONDecoder(json.JSONDecoder):
    def decode(self, str):
        result = super().decode(str)
        return GenericJSONDecoder.instantiate_object(result)

    @staticmethod
    def instantiate_object(result):
        if not isinstance(result, dict):  # or
            if is_iterable(result):
                return [GenericJSONDecoder.instantiate_object(v) for v in result]
            else:
                return result

        if not result.get('__custom__', False):
            return {k: GenericJSONDecoder.instantiate_object(v) for k, v in result.items()}

        import sys
        module = result['__module__']
        if module not in sys.modules:
            __import__(module)
        cls = getattr(sys.modules[module], result['__name__'])
        if hasattr(cls, '__json_decode__'):
            return cls.__json_decode__(result['data'])
        instance = cls.__new__(cls)
        data = {k: GenericJSONDecoder.instantiate_object(v) for k, v in result['data'].items()}
        instance.__dict__.update(data)
        return instance


class C:

    def __init__(self):
        self.c = 133

    def __repr__(self):
        return "C<" + str(self.__dict__) + ">"


class B:

    def __init__(self):
        self.b = {'int': 123, "c": C()}
        self.l = [123, C()]
        self.t = (234, C())
        self.s = "Blah"

    def __repr__(self):
        return "B<" + str(self.__dict__) + ">"


class A:
    class_y = 13

    def __init__(self):
        self.x = B()

    def __repr__(self):
        return "A<" + str(self.__dict__) + ">"


def dumps(obj, *args, **kwargs):
    return json.dumps(obj, *args, cls=GenericJSONEncoder, **kwargs)


def dump(obj, *args, **kwargs):
    return json.dump(obj, *args, cls=GenericJSONEncoder, **kwargs)


def loads(obj, *args, **kwargs):
    return json.loads(obj, *args, cls=GenericJSONDecoder, **kwargs)


def load(obj, *args, **kwargs):
    return json.load(obj, *args, cls=GenericJSONDecoder, **kwargs)

Check it out:

e = dumps(A())
print("ENCODED:\n\n", e)
b = json.loads(e, cls=GenericJSONDecoder)
b = loads(e)
print("\nDECODED:\n\n", b)

Prints:

 A<{'x': B<{'b': {'int': 123, 'c': C<{'c': 133}>}, 'l': [123, C<{'c': 133}>], 't': [234, C<{'c': 133}>], 's': 'Blah'}>}>

The original version only reconstructs the A correctly while all instances of B and C are not instantiated but left as dicts:

A<{'x': {'__custom__': True, '__module__': '__main__', '__name__': 'B', 'data': {'b': {'int': 123, 'c': {'__custom__': True, '__module__': '__main__', '__name__': 'C', 'data': {'c': 133}}}, 'l': [123, {'__custom__': True, '__module__': '__main__', '__name__': 'C', 'data': {'c': 133}}], 't': [234, {'__custom__': True, '__module__': '__main__', '__name__': 'C', 'data': {'c': 133}}], 's': 'Blah'}}}>

Note that if the type contains an collection like list or tuple, the actual type of the collection can not be restored during decoding. This is because all those collections will be converted into lists when encoded to json.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM