简体   繁体   English

Hadoop面试查询-Mapreduce-Pig-Hive

[英]Hadoop interview query-Mapreduce-Pig-Hive

This was a question asked for me in my hadoop interview.这是我在 hadoop 采访中问我的问题。 I have the table data like below.我有如下表数据。

I have taken a new bike and on the 1st day the distance I have travelled 20 km 2nd day the reading on the meter was 50(day 1 + day 2) 3rd day the reading on the meter was 60(day 1+ day 2+ day 3)我买了一辆新自行车,第一天我行驶了 20 公里 第二天仪表上的读数是 50(第 1 天+第 2 天) 第 3 天仪表上的读数是 60(第 1 天+第 2 天+第 3 天)

Day Distance
1    20
2    50
3    60

Now the question is, I want the output to be like below现在的问题是,我希望 output 如下所示

Day  Distance
1    20
2    30
3    10

ie I want the distance travelled only on the 1st day, 2nd day and 3rd day.即我只希望在第 1 天、第 2 天和第 3 天行驶的距离。

Answer can be in Hive/Pig/MapReduce.答案可以在 Hive/Pig/MapReduce 中。

Thanks谢谢

This is a running totals like problem, you can resolve it by this Hive query 这是一个总计问题,您可以通过此Hive查询来解决

with b as (
select 0 as d, 0 as dst
union all 
select d, dst from mytable
)
SELECT a.d, a.km-b.km new_dst from mytable a, b 
where a.d-b.d==1

You can use Hive's in-built windowing and analytics functions to get the desired resutls. 您可以使用Hive的内置窗口和分析功能来获取所需的结果。

Here is one way. 这是一种方法。

SELECT day, NVL(CAST(distance-LAG(distance) OVER (ORDER BY day) AS INT),20) 
FROM table;

I tried in map reduce. 我尝试在地图减少。 package hadoop; 包hadoop;

public class distance {

public static class disMapper extends Mapper<LongWritable,Text,IntWritable,IntWritable>
{
    //1 20
    int pValue=0;
    IntWritable outkey=new IntWritable();
    IntWritable outvalue=new IntWritable();
    public void map(LongWritable key,Text values,Context context) throws IOException, InterruptedException
    {
        String cols[]=values.toString().split("\t");
        int dis=Integer.parseInt(cols[1])-pValue;
        outkey.set(Integer.parseInt(cols[0]));
        outvalue.set(dis);
        pValue=Integer.parseInt(cols[1]);
        context.write(outkey, outvalue);
    }
}

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    // TODO Auto-generated method stub
    Configuration conf=new Configuration();
    Job job =new Job(conf,"dfdeff");
    job.setJarByClass(distance.class);
    job.setMapperClass(disMapper.class);

    job.setMapOutputKeyClass(IntWritable.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setNumReduceTasks(0);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true)?1:0);

}

} }

Below query also works fine下面的查询也可以正常工作

select day, CASE WHEN day = 1 THEN distance ELSE (distance - LAG(distance) over(ORDER BY day)) END AS dailyReading from table; 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM