The following fails:
data = FOREACH rawData GENERATE (int) col;
aggregate = FOREACH data GENERATE MIN(col);
Is there somehow I can get it the above to work?
I tried this:
data = FOREACH rawData GENERATE 1 dummy, (int) col;
grouped = GROUP data BY dummy;
aggregate = FOREACH grouped GENERATE MIN(data.col)
Now I get a: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
There are literally 11 rows of integers (1..11)...so not sure why I'm getting an outOfMemoryError?
I'm running the script from the command line pig -f myscript.pig
New to PIG so if I need to set something pls let me know...
Your intuition was correct in that you need to group the data first, before using MIN
. You can use GROUP ALL
for this purpose:
data = FOREACH rawData GENERATE (int) col;
grouped = GROUP data ALL;
aggregate = FOREACH grouped GENERATE MIN($1);
After a GROUP
operation, $0
contains the group, $1
the first column, $2
the second column, and so on.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.