简体   繁体   English

mapReduce模式的最佳python实现是什么?

[英]What's the best python implementation for mapReduce pattern?

What's the best Python implementation for MapReduce , a framework or a library, probably as good as Apache hadoop one, but if only it's in Python and best in terms of good documented and easy understanding, fully implemented for MapReduce pattern, high scalability, high stability, and lightweight. 什么是MapReduce ,一个框架或库的最佳Python实现,可能与Apache hadoop一样好,但如果它只是在Python中,并且在良好的文档和易于理解方面最好,完全实现MapReduce模式,高可扩展性,高稳定性,轻巧。

I googled one called mincemeat , not sure about it, but any others well known? 我用谷歌搜索了一个叫做mincemeat ,不确定它,但其他任何人都知道吗?

Thanks 谢谢

There are some pieces here and there if you search for them. 如果你搜索它们,这里和那里有一些部分。 For example Octopy and Disco as well as Hadoopy . 例如OctopyDisco以及Hadoopy

However, I don't believe that any of them can compete Hadoop in terms of maturity, stability, scalability, performance, etc. For small cases they should suffice, but for something more "glorious", you have to stick to Hadoop. 但是,我不认为他们中的任何一个可以在成熟度,稳定性,可扩展性,性能等方面与Hadoop竞争。对于小案例,它们应该足够,但对于更“光荣”的东西,你必须坚持使用Hadoop。

Remember that you can still write map/reduce programs in Hadoop with python/jython. 请记住,您仍然可以使用python / jython在Hadoop中编写map / reduce程序。

EDIT : I've recently came across mrjob . 编辑:我最近遇到了mrjob It seems great, as it eases the way to write map/reduce programs and then launch them on Hadoop or on Amazon's Elastic MapReduce platform. 这看起来很棒,因为它简化了编写map / reduce程序然后在Hadoop或Amazon的Elastic MapReduce平台上启动它们的方法。 The article that brough the good news is here 通过这个好消息的文章就在这里

Another good option is Dumbo . 另一个不错的选择是Dumbo

Below is the code to run a map/reduce for word counting. 下面是运行map / reduce进行字数统计的代码。

def mapper(key,value):
  for word in value.split(): yield word,1
def reducer(key,values):
  yield key,sum(values)

if __name__ == "__main__":
  import dumbo
  dumbo.run(mapper,reducer)

To run it, just feed your text file wc_input.txt for counting, the output is saved as wc_output . 要运行它,只需输入文本文件wc_input.txt进行计数,输出将保存为wc_output

 python -m dumbo wordcount.py -hadoop /path/to/hadoop -input wc_input.txt -output wc_output

You should also look at Mrs: http://code.google.com/p/mrs-mapreduce/ 您还应该看看Mrs: http//code.google.com/p/mrs-mapreduce/

It is particularly well-suited for computationally intensive iterative programs. 它特别适用于计算密集型迭代程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中挑选特定模式的字符串的最佳方法是什么? - What‘s the best way to pick out string of a specific pattern in python? 用于默认为“祖父母”类的实现的Python模式 - Python pattern for defaulting to a 'grandparent' class's implementation python中最有序的dict实现是什么? - What is the best ordered dict implementation in python? python存储库模式中“查找”的最佳实践是什么? - What is the best practice for 'find' in the python repository pattern? 使用Python,Pika和AMQP设计异步RPC应用程序的最佳模式是什么? - What's the best pattern to design an asynchronous RPC application using Python, Pika and AMQP? 从 TWILIO 的 HTTP GET REST API 响应中消费分页消息的最佳 Python 设计模式是什么? - What is the best Python Design pattern for consuming Paginated Messages from a TWILIO's HTTP GET REST API response? python的reduce实现会产生哪些开销? - What overhead is incurred in python's reduce implementation? Python中存储库模式的实现? - Implementation of Repository Pattern in Python? Python的最佳维护通用函数实现是什么? - What is the best-maintained generic functions implementation for Python? if 语句块返回的最佳模式是什么? - What's the best pattern for if-statement blocks for returning?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM