简体   繁体   中英

How i can define schema for array using jsonloader?

I am using elephantbird project to load a json file to pig. But i am not sure how i can define the schema at load. Did not find a description about the same.

data:

{"id":22522,"name":"Product1","colors":["Red","Blue"],"sizes":["S","M"]}
{"id":22523,"name":"Product2","colors":["White","Blue"],"sizes":["M"]}

code:

feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;

extracted_products = FOREACH feed GENERATE
    products_json#'id' AS id,
    products_json#'name' AS name,
    products_json#'colors' AS colors,
    products_json#'sizes' AS sizes;

describe extracted_products;

result:

extracted_products: {id: chararray,name: bytearray,colors: bytearray,sizes: bytearray}

how i can give the correct schema to them (int,string,array,array) and how can i flatten array elements into rows?

thanks in advance

to convert json array to tuple:

feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;

extracted_products = FOREACH feed GENERATE
products_json#'id' AS id:chararray,
products_json#'name' AS name:chararray,
products_json#'colors' AS colors:{t:(i:chararray)},
products_json#'sizes' AS sizes:{t:(i:chararray)};

to flatten a tuple

flattened = foreach extracted_products generate id,flatten(colors);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM