[英]pig programming to use split on group by having count(*)
輸入文件為:
2, cornflakes, Regular,General Mills, 12
3, cornflakes, Mixed Nuts, Post, 14
4, chocolate syrup, Regular, Hersheys, 5
5, chocolate syrup, No High Fructose, Hersheys, 8
6, chocolate syrup, Regular, Ghirardeli, 6
7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7
filter3 = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
SPLIT filter3 INTO filter4 IF (FOREACH (filter3 GROUP BY item) GENERATE group, COUNT(item < 3)), filter6_pass OTHERWISE;
就像對帶有count(*)<3的項進行分組的SQL一樣
所需的輸出是:
4, chocolate syrup, Regular, Hersheys, 5
5, chocolate syrup, No High Fructose, Hersheys, 8
6, chocolate syrup, Regular, Ghirardeli, 6
7, chocolate syrup, Strawberry Flavor, Ghirardeli, 7
按項目分組,獲取計數,然后對計數使用過濾器
A = LOAD 'location_of_file' using PigStorage('\t') as (item_sl : int, item : chararray, type: chararray, manufacturer: chararray, price : int);
B = GROUP A BY item;
C = FOREACH B GENERATE group,COUNT(A.item) AS Total;
D = FILTER C BY Total > 3;
E = JOIN A BY item,D BY $0;
F = FOREACH E GENERATE $0..$4;
DUMP F;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.