I have the following relation in Apache PIG.
TSERIES: {ORDERED: {(timestamp: long,contentHost: chararray)},ts1: long}
And I want to do the following:
F = foreach TSERIES {
ts = filter ORDERED by timestamp > TSERIES.ts1;
generate ts;
}
In short, I want to keep all elements of bag ORDERED with a timestmap higher than ts1, but pig won't allow, specifically this part ts = filter ORDERED by timestamp > TSERIES.ts1;
.
Is this possible? I'm using version 0.9.2-cdh4.0.1
(cloudera).
Did you tried :
Test = filter tseries By (ordered.timestamp > ts1);
I'm not sure if there's a way to do this without a UDF... it seems like there should be, but I can't figure it out either. Anyway, you could either write a UDF to do this directly: go through the bag, filter out some, and return a bag. Or, you could write a UDF to generate UUIDs and then flatten the bag and re-group it - smoething like this:
a = foreach TSERIES generate ORDERED, ts1, myudfs.GenerateUUID() as id;
b = foreach a generate FLATTEN(ORDERED) as ts, ts1, id;
c = filter b by ts.timestamp > ts1;
d = group c by id;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.