简体   繁体   English

应该何时使用MapReduce而不是Pig / Hive?

[英]When should one use MapReduce instead of Pig/Hive?

I am not having a problem understand the fact the hive and pig makes the job of a programmer easier. 我没有遇到问题,因为蜂巢和猪使程序员的工作变得更容易。 But are there any limitation where one cannot use them and have to rely on map reduce? 但有没有限制,谁不能使用它们,必须依靠map reduce?

When asked this question in an interview, what should be the response like? 当在一次采访中被问到这个问题时,反应应该是什么样的?

As Chirag points out with MR you get more low level control, and thus more potential for optimization. 正如Chirag指出的那样,MR可以获得更多的低级别控制,从而更有可能进行优化。 I'd also like to add: 我还想补充一下:

  1. Pig and Hive are more for scripts, and thus more volatile and harder to debug. Pig和Hive更适用于脚本,因此更易于调试和更难调试。 Setting up proper logging and monitoring in MR allows for more robust programs. 在MR中设置适当的记录和监视可以实现更强大的程序。

  2. You don't have to stick to Java MR to do MR, frameworks like Scalding and Cascading streamline a huge amount while still giving you the flexibility to drop down into the lower levels to do optimizations. 您不必坚持使用Java MR来执行MR,像Scalding和Cascading这样的框架可以简化数量,同时仍然可以灵活地降低到较低级别进行优化。 In fact Scalding is basically the most concise framework you can get, more concise than Pig and Hive - mainly by virtue of it being in Scala. 事实上,Scalding基本上是最简洁的框架,比Pig和Hive更简洁 - 主要是因为它在Scala中。

With MapReduce we have more control, we can do something to increase performance. 使用MapReduce我们有更多的控制权,我们可以做一些事情来提高性能。

Also the skill of the team should be consideration ( like what if they are good only in Java ) 还应考虑团队的技能(如果他们只在Java中表现好的话)

Not sure if everything can be described in hive or pig ( like unstructured data ) 不确定是否可以在蜂巢或猪中描述所有内容(如非结构化数据)

Below link should be useful. 下面的链接应该是有用的。

http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce http://blog.mortardata.com/post/60274287605/pig-vs-mapreduce

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM