简体   繁体   中英

how to write mapreduce code for hive query

how to write map reduce code for

  1. select * from tables
  2. for left outer join

because hive ql is taking a long time. For 1 GB of data its taking nearly 10 minutes.

how combiner and shuffle work internally ?

1) You should start using the EXPLAIN or EXPLAIN EXTENDED command which shows how Hive translate queries into Mapreduce job.

Hive mainly launches the MapReduce job for the operations like-

data filtering, data aggregation(min, max, avg), Join/products and Intersection of tables, sorting, etc You first learn how to implement above algo/patterns in MapReduce.

2) I would recommend you to read the book Join Algorithm using Map-Reduce for better understanding about how to join the datasets using the MapR. Hive follows the same pattern to join the tables(datasets).

3) Combiner, Shuffle and sort read the book "Oreilly Hadoop The Definitive Guide Tom White- chapter 6"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM