简体   繁体   English

如何使用Hive / Pig / MapReduce展平递归层次结构

[英]How to flatten recursive hierarchy using Hive/Pig/MapReduce

I have unbalanced tree data stored in tabular format like: 我有以表格格式存储的不平衡树数据,如:



The depth of tree is unknow. 树的深度是未知的。

how to flatten this hierarchy where each row contains entire path from leaf node to root node in a row as: 如何展平此层次结构,其中每行包含从一行中的叶节点到根节点的整个路径:

leaf node, root node, intermediate nodes

Any suggestions to solve above problem using hive, pig or mapreduce? 使用hive,pig或mapreduce解决上述问题的任何建议? Thanks in advance. 提前致谢。

I tried to solve it using pig, here are the sample code: 我试着用猪解决它,这里是示例代码:

Join function: 加入功能:

-- Join parent and child
Define join_hierarchy ( leftA, source, result) returns output {
    joined= join $leftA by parent left, $source by child;
    tmp_filtered= filter joined by source::parent is null;
    part= foreach tmp_filtered leftA::child as child, leftA::path as path;
    $result= union part, $result;
    part_remaining= filter joined by source::parent is not null;
    $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)

Load dataset: 加载数据集:

--My dataset field delimiter is ','.    
source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
--create additional column for path
leftA= foreach source generate child, parent, concat(parent,':');  

--initially result table will be blank.
result= limit leftA 1;
result= foreach result generate '' as child , '' as parent;
--Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.

leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM