简体   繁体   English

合并两个包并从猪的第一个包中获取所有字段

[英]Merge two bag and get all the field from first bag in pig

I am new to PIG scripting.我是 PIG 脚本的新手。 need some help on this issue.在这个问题上需要一些帮助。

I got two set of bag in pig and from there I want to get all the field from first bag and overwrite data of first bag if second bag has the data of same field我在猪身上得到了两组包,从那里我想从第一个包中获取所有字段,如果第二个包具有相同字段的数据,则覆盖第一个包的数据

Column list are dynamic (columns may get added or deleted any time).列列表是动态的(列可能随时添加或删除)。 in set b we may get data in another field also which are currently blank, if so, then we need to overwrite set a with data available in set b在集合 b 中,我们可能会在另一个字段中获取当前也是空白的数据,如果是这样,那么我们需要用集合 b 中可用的数据覆盖集合 a

columns - uniqueid,catagory,b,c,d,e,f,region,g,h,date,direction,indicator列 - uniqueid、类别、b、c、d、e、f、区域、g、h、日期、方向、指示器

EG:例如:

all_data= COGROUP a by (uniqueid), b by (uniqueid);

Output:输出:

(1,{(1,test,,,,,,,,city,,,,,2020-06-08T18:31:09.000Z,west,,,,,,,,,,,,,A)},{(1,,,,,,,,,,,,,,2020-09-08T19:31:09.000Z,,,,,,,,,,,,,,N)})
    
(2,{(2,test2,,,,,,,,dist,,,,,2020-08-02T13:06:16.000Z,east,,,,,,,,,,,,A)},{(2,,,,,,,,,,,,,,2020-09-08T18:31:09.000Z,,,,,,,,,,,,,,N)})

Expected Result:预期结果:

(1,test,,,,,,,,city,,,,,2020-09-08T19:31:09.000Z,west,,,,,,,,,,,,,N)
(2,test2,,,,,,,,dist,,,,,2020-09-08T18:31:09.000Z,east,,,,,,,,,,,,N)

I was able to achieve expected output with below我能够通过以下方式实现预期的输出

final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) as region ,flatten($2.(indicator)) as indicator; final = FOREACH all_data GENERATE flatten($1),flatten($2.(region)) 作为 region ,flatten($2.(indicator)) 作为指标;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM