简体   繁体   English

SQL-HIVE-PIG -Mapreduce

[英]SQL-HIVE-PIG -Mapreduce

There are 5 columns in each line and those 5 columns are commonly separated by comma 每行有5列,而这5列通常用逗号分隔

1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount

Hope you understood what data it contains 希望您了解其中包含的数据

surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000

What i need is i want to see how much amount each person has spent via online mode and offline mode 我需要的是我想看看每个人通过在线模式和离线模式花了多少钱

basically i need the reducer output should like below 基本上我需要减速器输出应该像下面

surender   OFFLINE   20000
surender   ONLINE    58000
Raja       OFFLINE   8000
Raja       ONLINE    32000
Kumar      OFFLINE    4000
Kumar      ONLINE    50000

And the final output should be like this: 最终输出应如下所示:

surender 20000  58000
Raja     8000   32000
Kumar     4000   50000 

You can give me a hive or pig query or either a mapreduce program 您可以给我一个蜂巢或猪查询或mapreduce程序

A = LOAD 'file_name' using PigStorage(',') as (name:chararray,date:chararray,product:chararray,mode:chararray,total:long);
B = GROUP A BY (name,mode);
C = FOREACH B GENERATE group.name as name,group.mode, SUM(total) as total;
D = GROUP C BY name;
E = FOREACH D GENERATE group, C.total;

if your data like the sample you provided has different spellings then you need to convert to uppercase before grouping 如果您的数据(如您提供的样本)具有不同的拼写,则需要在分组之前转换为大写

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM