简体   繁体   中英

Hadoop Ecosystem: Map Reduce needed for Pig/Hive

There is a whole lot of hadoop ecosystem pictures on the internet, so i struggle to get an understanding how the tools work together.

Eg in the picture attached, why are pig and hive based on map reduce whereas the other tools like spark or storm on YARN?

Would you be so kind and explain this?

Thanks! BR

haddop ecosystem

The picture shows Pig and Hive on top of MapReduce . This is because MapReduce is a distributed computing engine that is used by Pig and Hive . Pig and Hive queries get executed as MapReduce jobs. It is easier to work with Pig and Hive , since they give a higher-level abstraction to work with MapReduce .

Now let's take a look at Spark / Storm / Flink on YARN in the picture. YARN is a cluster manager that allows various applications to run on top of it. Storm , Spark and Flink are all examples of applications that can run on top of YARN . MapReduce is also considered as an application that can run on YARN , as shown in the diagram. YARN handles the resource management piece so that multiple applications can share the same cluster. (If you are interested in another example of a similar technology, check out Mesos ).

Finally, at the bottom of the picture is HDFS . This is the distributed storage layer that allows applications to store and access data. It provides features such as distributed storage, replication and fault tolerance.

If you are interested in deeper-dives, check out the Apache Projects page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM