在猪中遍历数组

Question

I have records with structures as follows : 我的记录结构如下：

"event" : [ {"x":"1","y":"2"} , {"x":"5","y":"2"}]
"event" : [ {"random":"r", "pol" : "t", "a" : "b"} , {"x":"4","y":5"}] 
"event" : [ {"random":"f", "pol" : "w", "a" : "r"} , {"x":"12","y":5"} , {"x":"6","y":"7"}]

The fields of interest to me are x & y. 我感兴趣的领域是x＆y。 For each record I need to extract the map that has highest value of x. 对于每条记录，我需要提取具有最高x值的地图。

IE for first event, pick {"x":"5","y":"2"} , for second {"x":"4","y":5"} and for third {"x":"12","y":5"} IE对于第一个事件，请选择{"x":"5","y":"2"} ，然后选择第二个{"x":"4","y":5"}和第三个{"x":"12","y":5"}

I know that we can use a UDF to iterate through each map in the array and pick the one with max x value, but is there a way where i can do this without writing a UDF? 我知道我们可以使用UDF遍历数组中的每个映射并选择具有最大x值的映射，但是有没有一种方法可以在不编写UDF的情况下做到这一点？

Answer 1

you can do something like this. 你可以做这样的事情。

REGISTER elephant-bird-core-4.3.jar;
REGISTER elephant-bird-hadoop-compat-4.5.jar;
REGISTER elephant-bird-pig-4.5.jar;

DEFINE JsonLoader com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true');

records = LOAD '$DATA_PATH' USING JsonLoader() AS (data: map[]);
events = FOREACH records GENERATE 
                                FLATTEN(data#'event') AS event;

grouped_events = COGROUP events by event#'x', event#'y';     

result = FOREACH grouped_events GENERATE
        MAX(events.event#'x'),
        MAX(events.event#'y');

The -nestedLoad option helps load json arrays, which we can flatten to separate events as above. -nestedLoad选项有助于加载json数组，如上所述，我们可以将其展平以分离事件。

在猪中遍历数组

问题描述

1 个解决方案

解决方案1
-1 2016-03-07 22:06:02

在猪中遍历数组

问题描述

1 个解决方案

解决方案1 -1 2016-03-07 22:06:02

解决方案1
-1 2016-03-07 22:06:02