简体   繁体   English

如何使用Hive / Pig / MapReduce展平递归层次结构

[英]How to flatten recursive hierarchy using Hive/Pig/MapReduce

I have unbalanced tree data stored in tabular format like: 我有以表格格式存储的不平衡树数据,如:

parent,child
a,b
b,c
c,d
c,f
f,g

在此输入图像描述

The depth of tree is unknow. 树的深度是未知的。

how to flatten this hierarchy where each row contains entire path from leaf node to root node in a row as: 如何展平此层次结构,其中每行包含从一行中的叶节点到根节点的整个路径:

leaf node, root node, intermediate nodes
d,a,d:c:b
f,a,e:b

Any suggestions to solve above problem using hive, pig or mapreduce? 使用hive,pig或mapreduce解决上述问题的任何建议? Thanks in advance. 提前致谢。

I tried to solve it using pig, here are the sample code: 我试着用猪解决它,这里是示例代码:

Join function: 加入功能:

-- Join parent and child
Define join_hierarchy ( leftA, source, result) returns output {
    joined= join $leftA by parent left, $source by child;
    tmp_filtered= filter joined by source::parent is null;
    part= foreach tmp_filtered leftA::child as child, leftA::path as path;
    $result= union part, $result;
    part_remaining= filter joined by source::parent is not null;
    $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)
 }

Load dataset: 加载数据集:

--My dataset field delimiter is ','.    
source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
--create additional column for path
leftA= foreach source generate child, parent, concat(parent,':');  

--initially result table will be blank.
result= limit leftA 1;
result= foreach result generate '' as child , '' as parent;
--Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.

leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM