[英]Hadoop interview query-Mapreduce-Pig-Hive
This was a question asked for me in my hadoop interview.这是我在 hadoop 采访中问我的问题。 I have the table data like below.
我有如下表数据。
I have taken a new bike and on the 1st day the distance I have travelled 20 km 2nd day the reading on the meter was 50(day 1 + day 2) 3rd day the reading on the meter was 60(day 1+ day 2+ day 3)我买了一辆新自行车,第一天我行驶了 20 公里 第二天仪表上的读数是 50(第 1 天+第 2 天) 第 3 天仪表上的读数是 60(第 1 天+第 2 天+第 3 天)
Day Distance
1 20
2 50
3 60
Now the question is, I want the output to be like below现在的问题是,我希望 output 如下所示
Day Distance
1 20
2 30
3 10
ie I want the distance travelled only on the 1st day, 2nd day and 3rd day.即我只希望在第 1 天、第 2 天和第 3 天行驶的距离。
Answer can be in Hive/Pig/MapReduce.答案可以在 Hive/Pig/MapReduce 中。
Thanks谢谢
This is a running totals like problem, you can resolve it by this Hive query 这是一个总计问题,您可以通过此Hive查询来解决
with b as (
select 0 as d, 0 as dst
union all
select d, dst from mytable
)
SELECT a.d, a.km-b.km new_dst from mytable a, b
where a.d-b.d==1
You can use Hive's in-built windowing and analytics functions to get the desired resutls. 您可以使用Hive的内置窗口和分析功能来获取所需的结果。
Here is one way. 这是一种方法。
SELECT day, NVL(CAST(distance-LAG(distance) OVER (ORDER BY day) AS INT),20)
FROM table;
I tried in map reduce. 我尝试在地图减少。 package hadoop;
包hadoop;
public class distance {
public static class disMapper extends Mapper<LongWritable,Text,IntWritable,IntWritable>
{
//1 20
int pValue=0;
IntWritable outkey=new IntWritable();
IntWritable outvalue=new IntWritable();
public void map(LongWritable key,Text values,Context context) throws IOException, InterruptedException
{
String cols[]=values.toString().split("\t");
int dis=Integer.parseInt(cols[1])-pValue;
outkey.set(Integer.parseInt(cols[0]));
outvalue.set(dis);
pValue=Integer.parseInt(cols[1]);
context.write(outkey, outvalue);
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
Configuration conf=new Configuration();
Job job =new Job(conf,"dfdeff");
job.setJarByClass(distance.class);
job.setMapperClass(disMapper.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)?1:0);
}
} }
Below query also works fine下面的查询也可以正常工作
select day, CASE WHEN day = 1 THEN distance ELSE (distance - LAG(distance) over(ORDER BY day)) END AS dailyReading from table;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.