简体繁体 English

是否有任何用例可以让 hadoop map-reduce 比 apache spark 做得更好？

[英]Are there any use cases where hadoop map-reduce can do better than apache spark?

原文 2015-08-03 12:00:20 7 1 apache-spark/ hadoop/ mapreduce

I agree that iterative and interactive programming paradigms are very good with spark than map-reduce.我同意iterative和interactive编程范式对于 spark 比 map-reduce 非常好。 And I also agree that we can use HDFS or any hadoop data store like HBase as a storage layer for Spark.我也同意我们可以使用 HDFS 或任何像 HBase 这样的 hadoop 数据存储作为 Spark 的存储层。

Therefore, my question is - Do we have any use cases in real world that can say hadoop MR is better than apache spark on those contexts.因此，我的问题是 - 我们在现实世界中是否有任何用例可以说在这些上下文中 hadoop MR 比 apache spark 更好。 Here "Better" is used in terms of performance, throughput, latency .这里“更好”用于performance, throughput, latency 。 Is hadoop MR is still the good one to do BATCH processing than using spark.与使用 spark 相比，hadoop MR 仍然是做 BATCH 处理的好方法。

If so, Can any one please tell the advantages of hadoop MR over apache spark ?如果是这样，任何人都可以告诉advantages of hadoop MR over apache spark的advantages of hadoop MR over apache spark吗？ Please keep the entire scope of discussion with respect to COMPUTATION LAYER .请保留关于COMPUTATION LAYER的整个讨论范围。

1 个解决方案

As you said, in iterative and interactive programming, the spark is better than hadoop.正如你所说，在iterative和interactive编程中，spark 比 hadoop 更好。 But spark has a huge need to the memory, if the memory is not enough, it would throw the OOM exception easily, hadoop can deal the situation very well, because hadoop has a good fault tolerant Mechanism.但是spark对内存的需求很大，如果内存不够，很容易抛出OOM异常，hadoop可以很好的处理这种情况，因为hadoop有很好的容错机制。

Secondly, if Data Tilt happened, spark maybe also collapse.其次，如果发生Data Tilt，spark也可能崩溃。 I compare the spark and hadoop on the system robustness, because this would decide the success of job.我在系统健壮性上比较了spark和hadoop，因为这将决定工作的成败。

Recently I test the spark and hadoop performance use some benchmark, according to the result, the spark performance is not better than hadoop on some load, eg kmeans, pagerank.最近我用一些benchmark测试spark和hadoop的性能，根据结果，spark在某些负载上的性能并不比hadoop好，比如kmeans，pagerank。 Maybe the memory is a limitation to spark.也许记忆是火花的限制。