简体   繁体   中英

Are there any use cases where hadoop map-reduce can do better than apache spark?

I agree that iterative and interactive programming paradigms are very good with spark than map-reduce. And I also agree that we can use HDFS or any hadoop data store like HBase as a storage layer for Spark.

Therefore, my question is - Do we have any use cases in real world that can say hadoop MR is better than apache spark on those contexts. Here "Better" is used in terms of performance, throughput, latency . Is hadoop MR is still the good one to do BATCH processing than using spark.

If so, Can any one please tell the advantages of hadoop MR over apache spark ? Please keep the entire scope of discussion with respect to COMPUTATION LAYER .

As you said, in iterative and interactive programming, the spark is better than hadoop. But spark has a huge need to the memory, if the memory is not enough, it would throw the OOM exception easily, hadoop can deal the situation very well, because hadoop has a good fault tolerant Mechanism.

Secondly, if Data Tilt happened, spark maybe also collapse. I compare the spark and hadoop on the system robustness, because this would decide the success of job.

Recently I test the spark and hadoop performance use some benchmark, according to the result, the spark performance is not better than hadoop on some load, eg kmeans, pagerank. Maybe the memory is a limitation to spark.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM