简体   繁体   中英

Python JSON serialize excluding certain fields

Summary

I have a Python object hierarchy I want to serialize using JSON (just via https://docs.python.org/3/library/json.html , not using any extra third-party library). I want to exclude certain fields/properties/sub-objects. I'm finding it surprisingly difficult to find a simple answer as to how to achieve this?

Example

I'll have a derived class instance ending up like this:

class MyItemClass(BaseItemClass):
    self.saveThisProperty = 999
    self.dontSaveThisProperty = "Something"
    self.saveThisObject = ObjectType1()
    self.dontSaveThisObject = ObjectType2()

If I were serializing to XML, I would want it to look like

<MyItemClass>
    <saveThisProperty>999</saveThisProperty>
    <saveThisObject>
        ...
    </saveThisObject>
</MyItemClass>

Note that I only serialize certain properties/sub-objects, and I do not want to serialize the whole BaseItemClass from which my class instance is derived.

In XML I'm fine. I know how to output bits of XML as I go along for what I do want, either to a temporary in-memory document which I save at the end or by outputting individual nodes/elements to the stream incrementally. I don't have to serialize everything. Eg

xmlStream.writeStartElement("MyItemClass")
    xmlStream.writeElementWithValue("saveThisProperty", 999)
    xmlStream.writeStartElement("saveThisObject")
        ...
    xmlStream.writeEndElement("saveThisObject")
xmlStream.writeEndElement("MyItemClass")

For JSON I can't do this, can I? Do I have to create some new, "standalone" object hierarchy (with no derivations from BaseClass ) by copying just the properties/sub-objects I want into it and then JSON serialize that?

I did see there is json.dump(default = ...) , but that says:

If specified, default should be a function that gets called for objects that can't otherwise be serialized. It should return a JSON encodable version of the object

However, it is not that the original objects cannot be serialized by default Python->JSON, it is that I do not want that default, serialize-everything behaviour, I want my "selective" one.

I am the OP. I post here for clarity what I have ended up using for my case.

I have marked @Sina Rezaei's post in this thread as the Accepted Solution, since that (the last section in his post) and @snakechamerb's comments inspired me to understand what is required.

The outline of my soluton looks like:

class ModelScene(QGraphicsScene):

  # Serialize whole scene to JSON into stream
  def json_serialize(self, stream) -> None:
    # Get `json.dump()` to call `ModelScene.json_serialize_dump_obj()` on every object to be serialized
    json.dump(self, stream, indent=4, default=ModelScene.json_serialize_dump_obj)

  # Static method to be called from `json.dump(default=ModelScene.json_serialize_dump_obj)`
  # This method is called on every object to be dumped/serialized
  @staticmethod
  def json_serialize_dump_obj(obj):
    # if object has a `json_dump_obj()` method call that...
    if hasattr(obj, "json_dump_obj"):
      return obj.json_dump_obj()
    # ...else just allow the default JSON serialization
    return obj

  # Return dict object suitable for serialization via JSON.dump()
  # This one is in `ModelScene(QGraphicsScene)` class
  def json_dump_obj(self) -> dict:
    return {
      "_classname_": self.__class__.__name__,
      "node_data": self.node_data
      }

class CanvasModelData(QAbstractListModel):

  # Return dict object suitable for serialization via JSON.dump()
  # This one is class CanvasModelData(QAbstractListModel)
  def json_dump_obj(self) -> dict:
    _data = {}
    for key, value in self._data.items():
      _data[key] = value
    return {
      "_classname_": self.__class__.__name__,
      "data_type": self.data_type,
      "_data": _data
      }
  • Every "complex" class defines a def json_dump_obj(self) -> dict: method.
  • That method returns just the properties/sub-objects wanted in the serialization.
  • The top-level json.dump(self, stream, default=ModelScene.json_serialize_dump_obj) causes every node visited to be incrementally serialized to stream, via static method ModelScene.json_serialize_dump_obj . And that calls my obj.json_dump_obj() if available, else default JSON serialization of basic object type.

Interestingly, I came across someone with the same concerns as me. From What is the difference between json.dump() and json.dumps() in python? , solution https://stackoverflow.com/a/57087055/489865 :

In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) -- without 's' , new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.

I can think of three solutions for your situation:

Solution 1: Use Pykson third party library and define the fields you want to be serialized as pykson fields.

Sample:

class MyItemClass(pykson.JsonObject):
    saved_property = pykson.IntegerField()

my_object = MyItemClass(saved_property=1, accept_unknown=True)
my_object.unsaved_property = 2
pykson.Pykson().to_json(my_object)

disclaimer: I am developer of pykson library.

Solution 2: The second solution is to use a wrapper class with custom default deserializer.

class ObjectWrapper:
    def __init__(self, value, should_serialize=False)
        self.value = value
        self.should_serialize = should_serialize

def default_handler(obj):
    if isinstance(obj, ObjectWrapper):
        if obj.should_serialize:
            return obj.value
        else:
            return None
    else:
        raise TypeError

json.dump(default=default_handler)

Solution 3: It might be a bad idea but if you have a in case of deep hierarchy, you can also add a function to allc classes which will be serialized and use this function to get a dictionary and easily convert the dictionary to json.

class MyChildClass:
     def __init__(self, serialized_property, not_serialized_property):
        self.serialized_property = serialized_property
        self.not_serialized_property = not_serialized_property

     def to_dict(self):
        # only add serialized property here
        return {
            "serialized_property": self.serialized_property
        }

class MyParentClass:
    def __init__(self, child_property, some_other_property):
        self.child_property = child_property
        self.some_other_property = some_other_property

    def to_dict(self):
        return {
            'child_property': self.child_property.to_dict(),
            'some_other_property': self.some_other_property
        }

my_child_object = MyChildClass(serialized_property=1, not_serialized_property=2)
my_parent_object = MyParentClass(child_property=my_child_object, some_other_property='some string here')
json.dumps(my_parent_object.to_dict())

Or you can achieve same result using default handler:

class MyChildClass:
     def __init__(self, serialized_property, not_serialized_property):
        self.serialized_property = serialized_property
        self.not_serialized_property = not_serialized_property

     def to_dict(self):
        # only add serialized property here
        return {
            "serialized_property": self.serialized_property
        }

class MyParentClass:
    def __init__(self, child_property, some_other_property):
        self.child_property = child_property
        self.some_other_property = some_other_property

    def to_dict(self):
        return {
            'child_property': self.child_property,
            'some_other_property': self.some_other_property
        }

def handle_default(obj):
    if isinstance(obj, MyChildClass):
        return obj.to_dict()
    elif isinstance(obj, MyParentClass):
        return obj.to_dict()
    return None

my_child_object = MyChildClass(serialized_property=1, not_serialized_property=2)
my_parent_object = MyParentClass(child_property=my_child_object, some_other_property='some string here')
json.dumps(my_parent_object, default=handle_default)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM