简体   繁体   English

我如何使用jsonloader为数组定义架构?

[英]How i can define schema for array using jsonloader?

I am using elephantbird project to load a json file to pig. 我正在使用Elephantbird项目将json文件加载到Pig。 But i am not sure how i can define the schema at load. 但是我不确定如何在加载时定义架构。 Did not find a description about the same. 找不到相同的描述。

data: 数据:

{"id":22522,"name":"Product1","colors":["Red","Blue"],"sizes":["S","M"]}
{"id":22523,"name":"Product2","colors":["White","Blue"],"sizes":["M"]}

code: 码:

feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;

extracted_products = FOREACH feed GENERATE
    products_json#'id' AS id,
    products_json#'name' AS name,
    products_json#'colors' AS colors,
    products_json#'sizes' AS sizes;

describe extracted_products;

result: 结果:

extracted_products: {id: chararray,name: bytearray,colors: bytearray,sizes: bytearray}

how i can give the correct schema to them (int,string,array,array) and how can i flatten array elements into rows? 我如何才能给它们正确的架构(整数,字符串,数组,数组),以及如何将数组元素展平为行?

thanks in advance 提前致谢

to convert json array to tuple: 将json数组转换为元组:

feed = LOAD '$INPUT' USING com.twitter.elephantbird.pig.load.JsonLoader() AS products_json;

extracted_products = FOREACH feed GENERATE
products_json#'id' AS id:chararray,
products_json#'name' AS name:chararray,
products_json#'colors' AS colors:{t:(i:chararray)},
products_json#'sizes' AS sizes:{t:(i:chararray)};

to flatten a tuple 扁平化一个元组

flattened = foreach extracted_products generate id,flatten(colors);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM