在Apache PIG中按父值过滤包

Question

I have the following relation in Apache PIG. 我在Apache PIG中具有以下关系。

TSERIES: {ORDERED: {(timestamp: long,contentHost: chararray)},ts1: long}

And I want to do the following: 我要执行以下操作：

F = foreach TSERIES {
    ts = filter ORDERED by timestamp > TSERIES.ts1;
    generate ts;
}

In short, I want to keep all elements of bag ORDERED with a timestmap higher than ts1, but pig won't allow, specifically this part ts = filter ORDERED by timestamp > TSERIES.ts1; 简而言之，我想使bag ORDERED包中的所有元素的时间戳都比ts1高，但是Pig不允许，特别是这部分ts = filter ORDERED by timestamp > TSERIES.ts1; . 。

Is this possible? 这可能吗？ I'm using version 0.9.2-cdh4.0.1 (cloudera). 我正在使用0.9.2-cdh4.0.1版（cloudera）。

Answer 1

Did you tried : 您是否尝试过：

Test = filter tseries By (ordered.timestamp > ts1); 测试=筛选器tseries By（ordered.timestamp> ts1）;

Answer 2

I'm not sure if there's a way to do this without a UDF... it seems like there should be, but I can't figure it out either. 我不确定如果没有UDF，是否有办法做到这一点……似乎应该有，但我也无法弄清楚。 Anyway, you could either write a UDF to do this directly: go through the bag, filter out some, and return a bag. 无论如何，您可以编写一个UDF直接执行此操作：检查袋子，过滤掉一些袋子，然后返回袋子。 Or, you could write a UDF to generate UUIDs and then flatten the bag and re-group it - smoething like this: 或者，您可以编写一个UDF生成UUID，然后将袋子放平并重新分组-像这样顺滑：

a = foreach TSERIES generate ORDERED, ts1, myudfs.GenerateUUID() as id;
b = foreach a generate FLATTEN(ORDERED) as ts, ts1, id;
c = filter b by ts.timestamp > ts1;
d = group c by id;

在Apache PIG中按父值过滤包

问题描述

2 个解决方案

解决方案1
0 2012-10-03 21:21:48

解决方案2
0 2012-10-03 23:42:09

在Apache PIG中按父值过滤包

问题描述

2 个解决方案

解决方案1 0 2012-10-03 21:21:48

解决方案2 0 2012-10-03 23:42:09

解决方案1
0 2012-10-03 21:21:48

解决方案2
0 2012-10-03 23:42:09