Hadoop: querying/reading Avro files

Question

I'm storing data which was imported from complex JSON object to Avro format.

JSON object is represented by object with nested objects and array of objects. Avro Schema looks like this:

{
    "type" : "record",
    "name" : "userInfo",
    "namespace" : "my.example",
    "fields" : [{"name" : "username", 
                 "type" : "string", 
                 "default" : "NONE"},

                {"name" : "age", 
                 "type" : "int",
                 "default" : -1},

                 {"name" : "phone", 
                  "type" : "string", 
                  "default" : "NONE"},

                 {"name" : "housenum", 
                  "type" : "string", 
                  "default" : "NONE"},

                  {"name" : "address", 
                   "type" : {
                         "type" : "record",
                         "name" : "mailing_address",
                         "fields" : [
                            {"name" : "street", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "city", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "state_prov", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "country", 
                             "type" : "string", 
                             "default" : "NONE"},

                            {"name" : "zip", 
                             "type" : "string", 
                             "default" : "NONE"}
                          ]},
                          "default" : {}
                }
    ]
}

I use NiFi to convert JSON to Avro and to store serialized files in Hadoop (currently I just use pure Hadoop):

My question:

For test purposes I would like to query data which stored HDFS (Avro format).

So at this point I'm a bit confused because a lot of tools and technologies around Hadoop.. How can I do it in right way? What tools and workflow?

Answer 1

You should be able to create an external Hive table on top of the HDFS location where you wrote the Avro data.

This post has examples:

https://community.hortonworks.com/questions/22135/is-there-a-way-to-create-hive-table-based-on-avro.html

https://cwiki.apache.org/confluence/display/Hive/AvroSerDe

Hadoop: querying/reading Avro files

Question

1 answers

solution1
2 2017-05-15 13:38:20

Hadoop: querying/reading Avro files

Question

1 answers

solution1 2 2017-05-15 13:38:20

solution1
2 2017-05-15 13:38:20