简体繁体中英

Hadoop Ecosystem: Map Reduce needed for Pig/Hive

原文 2018-05-03 15:28:55 1 1 hadoop/ hive/ mapreduce/ apache-pig

There is a whole lot of hadoop ecosystem pictures on the internet, so i struggle to get an understanding how the tools work together.

Eg in the picture attached, why are pig and hive based on map reduce whereas the other tools like spark or storm on YARN?

Would you be so kind and explain this?

Thanks! BR

haddop ecosystem

1 answers

The picture shows Pig and Hive on top of MapReduce . This is because MapReduce is a distributed computing engine that is used by Pig and Hive . Pig and Hive queries get executed as MapReduce jobs. It is easier to work with Pig and Hive , since they give a higher-level abstraction to work with MapReduce .

Now let's take a look at Spark / Storm / Flink on YARN in the picture. YARN is a cluster manager that allows various applications to run on top of it. Storm , Spark and Flink are all examples of applications that can run on top of YARN . MapReduce is also considered as an application that can run on YARN , as shown in the diagram. YARN handles the resource management piece so that multiple applications can share the same cluster. (If you are interested in another example of a similar technology, check out Mesos ).

Finally, at the bottom of the picture is HDFS . This is the distributed storage layer that allows applications to store and access data. It provides features such as distributed storage, replication and fault tolerance.

If you are interested in deeper-dives, check out the Apache Projects page.

practical usage of hadoop map reduce hive pig hbase

Pig vs Hive vs Native Map Reduce

Hive / Map-Reduce Job on a Hadoop cluster: How to (roughly) calculate the diskspace needed?

Difference between hive,pig,map-reduce use cases

Analyzing Map-Reduce jobs produced by Pig/Hive compiler

In hadoop ecosystem, I am using Pig and got stuck in following query?

What is a job history server in Hadoop and why is it mandatory to start the history server before starting Pig in Map Reduce mode?

Using Pig/Hive for data processing instead of direct java map reduce code?

Hadoop's Hive/Pig, HDFS and MapReduce relationship

Hadoop - Load Hive tables using PIG

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question practical usage of hadoop map reduce hive pig hbase Pig vs Hive vs Native Map Reduce Hive / Map-Reduce Job on a Hadoop cluster: How to (roughly) calculate the diskspace needed? Difference between hive,pig,map-reduce use cases Analyzing Map-Reduce jobs produced by Pig/Hive compiler In hadoop ecosystem, I am using Pig and got stuck in following query? What is a job history server in Hadoop and why is it mandatory to start the history server before starting Pig in Map Reduce mode? Using Pig/Hive for data processing instead of direct java map reduce code? Hadoop's Hive/Pig, HDFS and MapReduce relationship Hadoop - Load Hive tables using PIG

Related Tags

Hadoop Ecosystem: Map Reduce needed for Pig/Hive

Question

1 answers

solution1 1 2018-05-04 04:09:10

solution1
1 2018-05-04 04:09:10