I am using elephantbird project to load a json file to pig. But i am not sure how i can define the schema at load. Did not find a description about the same.
data:
{"id":22522,"name":"Product1","colors":["Red","Blue"],"sizes":["S","M"]}
{"id":22523,"name":"Product2","colors":["White","Blue"],"sizes":["M"]}
code:
feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;
extracted_products = FOREACH feed GENERATE
products_json#'id' AS id,
products_json#'name' AS name,
products_json#'colors' AS colors,
products_json#'sizes' AS sizes;
describe extracted_products;
result:
extracted_products: {id: chararray,name: bytearray,colors: bytearray,sizes: bytearray}
how i can give the correct schema to them (int,string,array,array) and how can i flatten array elements into rows?
thanks in advance
to convert json array to tuple:
feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;
extracted_products = FOREACH feed GENERATE
products_json#'id' AS id:chararray,
products_json#'name' AS name:chararray,
products_json#'colors' AS colors:{t:(i:chararray)},
products_json#'sizes' AS sizes:{t:(i:chararray)};
to flatten a tuple
flattened = foreach extracted_products generate id,flatten(colors);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.