简体   繁体   中英

Is there a simple Python map-reduce framework that uses the regular filesystem?

I have a few problems which may apply well to the Map-Reduce model. I'd like to experiment with implementing them, but at this stage I don't want to go to the trouble of installing a heavyweight system like Hadoop or Disco.

Is there a lightweight Python framework for map-reduce which uses the regular filesystem for input, temporary files, and output?

A Coursera course dedicated to big data suggests using these lightweight python Map-Reduce frameworks:

To get you started very quickly, try this example:

https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2

(hint: for [server address] in this example use localhost)

http://pythonhosted.org/mrjob/ is great to quickly get started on your local machine, basically all you need is a simple:

pip install mrjob

http://jsmapreduce.com/ -- in-browser mapreduce; in Python or Javascript; nothing to install

Check out Apache Spark . It is written in Java but it has also a Python API. You can try it locally on your machine and then, when you need it, you can easily distribute your computation over a cluster.

MockMR - https://github.com/sjtrny/mockmr

It's meant for educational use. Does not currently operate in parallel but accepts standard Python objects as IO.

So this was asked ages ago, but I worked on a full implementation of mapreduce over the weekend: remap.

https://github.com/gtoonstra/remap

Pretty easy to install with minimal dependencies, if all goes well you should be able to run a test run in 5 minutes.

The entire processing pipeline works, but submitting and monitoring jobs is still being worked on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM