I'm storing data which was imported from complex JSON object to Avro format.
JSON object is represented by object with nested objects and array of objects. Avro Schema looks like this:
{
"type" : "record",
"name" : "userInfo",
"namespace" : "my.example",
"fields" : [{"name" : "username",
"type" : "string",
"default" : "NONE"},
{"name" : "age",
"type" : "int",
"default" : -1},
{"name" : "phone",
"type" : "string",
"default" : "NONE"},
{"name" : "housenum",
"type" : "string",
"default" : "NONE"},
{"name" : "address",
"type" : {
"type" : "record",
"name" : "mailing_address",
"fields" : [
{"name" : "street",
"type" : "string",
"default" : "NONE"},
{"name" : "city",
"type" : "string",
"default" : "NONE"},
{"name" : "state_prov",
"type" : "string",
"default" : "NONE"},
{"name" : "country",
"type" : "string",
"default" : "NONE"},
{"name" : "zip",
"type" : "string",
"default" : "NONE"}
]},
"default" : {}
}
]
}
I use NiFi to convert JSON to Avro and to store serialized files in Hadoop (currently I just use pure Hadoop):
My question:
For test purposes I would like to query data which stored HDFS (Avro format).
So at this point I'm a bit confused because a lot of tools and technologies around Hadoop.. How can I do it in right way? What tools and workflow?
You should be able to create an external Hive table on top of the HDFS location where you wrote the Avro data.
This post has examples:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.