Suppose I have following json:
[
{"id":1,"text":"some text","user_id":1},
{"id":1,"text":"some text","user_id":2},
...
]
What would be an appropriate avro schema for this array of objects?
[short answer]
The appropriate avro schema for this array of objects would look like:
const type = avro.Type.forSchema({
type: 'array',
items: { type: 'record', fields:
[ { name: 'id', type: 'int' },
{ name: 'text', type: 'string' },
{ name: 'user_id', type: 'int' } ]
}
});
[long answer]
We can use Avro to help us build the above schema by given data object.
Let's use npm package " avsc ", which is "Pure JavaScript implementation of the Avro specification".
Since Avro can infer a value's schema we can use following trick to get schema by given data (unfortunately it seems can't show nested schemas, but we can ask twice - for top level structure (array) and then for array element):
// don't forget to install avsc
// npm install avsc
//
const avro = require('avsc');
// avro can infer a value's schema
const type = avro.Type.forValue([
{"id":1,"text":"some text","user_id":1}
]);
const type2 = avro.Type.forValue(
{"id":1,"text":"some text","user_id":1}
);
console.log(type.getSchema());
console.log(type2.getSchema());
Output:
{ type: 'array',
items: { type: 'record', fields: [ [Object], [Object], [Object] ] } }
{ type: 'record',
fields:
[ { name: 'id', type: 'int' },
{ name: 'text', type: 'string' },
{ name: 'user_id', type: 'int' } ] }
Now let's compose proper schema and try to use it to serialize object and then de-serialize it back!
const avro = require('avsc');
const type = avro.Type.forSchema({
type: 'array',
items: { type: 'record', fields:
[ { name: 'id', type: 'int' },
{ name: 'text', type: 'string' },
{ name: 'user_id', type: 'int' } ]
}
});
const buf = type.toBuffer([
{"id":1,"text":"some text","user_id":1},
{"id":1,"text":"some text","user_id":2}]); // Encoded buffer.
const val = type.fromBuffer(buf);
console.log("deserialized object: ", JSON.stringify(val, null, 4)); // pretty print deserialized result
var fs = require('fs');
var full_filename = "/tmp/avro_buf.dat";
fs.writeFile(full_filename, buf, function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved to '" + full_filename + "'");
});
Output:
deserialized object: [
{
"id": 1,
"text": "some text",
"user_id": 1
},
{
"id": 1,
"text": "some text",
"user_id": 2
}
]
The file was saved to '/tmp/avro_buf.dat'
We can even enjoy the compact binary representation of the above exercise:
hexdump -C /tmp/avro_buf.dat
00000000 04 02 12 73 6f 6d 65 20 74 65 78 74 02 02 12 73 |...some text...s|
00000010 6f 6d 65 20 74 65 78 74 04 00 |ome text..|
0000001a
Nice, isn't she?-)
Concerning your question, correct schema is
{
"name": "Name",
"type": "array",
"namespace": "com.hi.avro.model",
"items": {
"name": "NameDetails",
"type": "record",
"fields": [
{
"name": "id",
"type": "int"
},
{
"name": "text",
"type": "string"
},
{
"name": "user_id",
"type": "int"
}
]
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.