简体   繁体   中英

Filter bag by parent value in Apache PIG

I have the following relation in Apache PIG.

TSERIES: {ORDERED: {(timestamp: long,contentHost: chararray)},ts1: long}

And I want to do the following:

F = foreach TSERIES {
    ts = filter ORDERED by timestamp > TSERIES.ts1;
    generate ts;
}

In short, I want to keep all elements of bag ORDERED with a timestmap higher than ts1, but pig won't allow, specifically this part ts = filter ORDERED by timestamp > TSERIES.ts1; .

Is this possible? I'm using version 0.9.2-cdh4.0.1 (cloudera).

Did you tried :

Test = filter tseries By (ordered.timestamp > ts1);

I'm not sure if there's a way to do this without a UDF... it seems like there should be, but I can't figure it out either. Anyway, you could either write a UDF to do this directly: go through the bag, filter out some, and return a bag. Or, you could write a UDF to generate UUIDs and then flatten the bag and re-group it - smoething like this:

a = foreach TSERIES generate ORDERED, ts1, myudfs.GenerateUUID() as id;
b = foreach a generate FLATTEN(ORDERED) as ts, ts1, id;
c = filter b by ts.timestamp > ts1;
d = group c by id;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM