[英]Filter bag by parent value in Apache PIG
I have the following relation in Apache PIG. 我在Apache PIG中具有以下关系。
TSERIES: {ORDERED: {(timestamp: long,contentHost: chararray)},ts1: long}
And I want to do the following: 我要执行以下操作:
F = foreach TSERIES {
ts = filter ORDERED by timestamp > TSERIES.ts1;
generate ts;
}
In short, I want to keep all elements of bag ORDERED with a timestmap higher than ts1, but pig won't allow, specifically this part ts = filter ORDERED by timestamp > TSERIES.ts1;
简而言之,我想使bag ORDERED包中的所有元素的时间戳都比ts1高,但是Pig不允许,特别是这部分
ts = filter ORDERED by timestamp > TSERIES.ts1;
. 。
Is this possible? 这可能吗? I'm using version
0.9.2-cdh4.0.1
(cloudera). 我正在使用
0.9.2-cdh4.0.1
版(cloudera)。
Did you tried : 您是否尝试过:
Test = filter tseries By (ordered.timestamp > ts1); 测试=筛选器tseries By(ordered.timestamp> ts1);
I'm not sure if there's a way to do this without a UDF... it seems like there should be, but I can't figure it out either. 我不确定如果没有UDF,是否有办法做到这一点……似乎应该有,但我也无法弄清楚。 Anyway, you could either write a UDF to do this directly: go through the bag, filter out some, and return a bag.
无论如何,您可以编写一个UDF直接执行此操作:检查袋子,过滤掉一些袋子,然后返回袋子。 Or, you could write a UDF to generate UUIDs and then flatten the bag and re-group it - smoething like this:
或者,您可以编写一个UDF生成UUID,然后将袋子放平并重新分组-像这样顺滑:
a = foreach TSERIES generate ORDERED, ts1, myudfs.GenerateUUID() as id;
b = foreach a generate FLATTEN(ORDERED) as ts, ts1, id;
c = filter b by ts.timestamp > ts1;
d = group c by id;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.