[英]How to sum 2 log files in pig
我有問題,總計2個日志文件。
示例文件:
文件1
id用戶視圖
1 AAA 2
2 BBB 5
3 CCC 9
文件2
ID用戶查看地址
1 AAA 5 XXX
2 BBB 2年
6 FFF 4 ZZZ
我想通過ID和求和(視圖)求和兩個文件,我希望輸出:
輸出:
id user view address
1 AAA 7 XXX
2 BBB 7 YYY
我應該嘗試代碼聯接兩個文件,但是我不對兩個文件求和:
我的代碼:
inputdata = LOAD '/user/hdfs/tes/part-1' AS (
id:chararray,
user:chararray,
view:int
);
inputdata2 = LOAD '/user/hdfs/tes/part-2' AS (
id:chararray,
user:chararray,
view:int,
address:chararray
);
joined = JOIN inputdata BY id LEFT OUTER, inputdata2 by id;
outputlist = FOREACH joined {
GENERATE
inputdata::id,
inputdata::user,
--sum(inputdata2::view),
inputdata2::address;
}
dump outputlist;
IAM問題,如何對兩個日志文件中的視圖求和。
謝謝。
在foreach循環中獲取聯接結果並匯總視圖值。
A = LOAD 'file1.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int);
B = LOAD 'file2.dat' using PigStorage(' ') AS (a:chararray,b:chararray,c:int,d:chararray);
C = JOIN A by a,B by a;
D = FOREACH C GENERATE A::a as id,A::b as user,A::c + B::c as view,B::d as address;
輸出:
(1,AAA,7,XXX)
(2,BBB,7,YYY)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.