简体   繁体   中英

Python JSON REST framework with (de)serialiser and schema validation (jsonschema/avro)

I think my problem is very common. I want to build a JSON REST API in Python (and possibly R later) to exchange data between applications. (I don't want to use BJSON or other binary formats at this point)

Given the availability of schemas and (de-)serialization frameworks, I thought it would be straightforward to build a system that does the following:

  1. it takes an input over HTTP get request,
  2. converts/parses/deserializes it to a python object,
  3. validates the python object given a common schema language description,
  4. does something with the parameters to get result data, and
  5. serialized the result data to JSON,
  6. validates the result json,
  7. returns it.

I want to use a schema language that is language-agnostic - nothing that's used only in one language. I looked at avro and json-schema as schema languages and the ecosystem for (de)serialization, but I couldn't find so far any tools that cover the bill. A particular problem is deserialization. There are some libraries that serialize, but then have problems with URL GET parameters that come as strings instead of integers.

?name=Mickey&age=4

given a schema (avro)

{
"type": "record",
"name": "User",
"namespace": "example.avro",
"fields": [
    {
    "name": "name",
    "type": "string"
    },
    {
    "name": "age",
    "type": "int"
    }
]
}

I want to get a dict(name='Mickey', age=4).

I've lost a lot of time already trying out different tools and frameworks, and I am at a point, where I am considering writing a deserializer from scratch.

I was looking around a lot, not finding what I was looking for. Projects like Marshmallow came close. They allow specification of schemas and provide for serialization, however I wanted the jsonschema format for interoperability.

So I've written my own serializer/deserializer class around a pandas dataframe, which represents the data. It has methods

  1. for creating a dataframe from a dictionary or json object (deserialization) 2. for coercing data types,
  2. for inferring a schema,
  3. for validating the schema, and
  4. for serialization to json.

It's not very clear what your problem with serialization and deserialization is. Most web frameworks would not treat URL query parameters as a JSON object, but rather treat each key separately and allow you to access the query string as a dictionary (or as arguments to the request handler, depending on framework). Something like request.query['name'] . JSON usually appears as the payload for POST, PUT, PATCH and as the response of the request. Even in that case, you get it as a string and do something like json.loads to obtain Python data structures, which you're then free to validate with a json-schema or avro validator library.

Two things come to my mind.

First, there's an Eve REST framework , written on top of Flask.

It's author also created (and used in Eve) a schema/validation language, called Cerberus , which does exactly what you want. Eve stores your JSONs in MongoDB.

If your application is not meant to be very extensive and use various tools that are already available in Django out-of-the-box or as packages, I'd stick with Eve. (oops, did't intend to start the eternal Django-vs-Flask flame here)


Second, there's Mongoengine library that provides Django-style schemas for JSONs.

It was intended to be a support of MongoDB for Django and design of its architecture mimics that of Django. Mongoengine itself is quite stable, but its integration with nice Django REST Framework , Django REST Framework-Mongoengine , is not by any means.

Besides, there are bindings of Mongoengine to Eve and Flask.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM