I have a few problems which may apply well to the Map-Reduce model. I'd like to experiment with implementing them, but at this stage I don't want to go to the trouble of installing a heavyweight system like Hadoop or Disco.
Is there a lightweight Python framework for map-reduce which uses the regular filesystem for input, temporary files, and output?
A Coursera course dedicated to big data suggests using these lightweight python Map-Reduce frameworks:
To get you started very quickly, try this example:
https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2
(hint: for [server address] in this example use localhost)
http://pythonhosted.org/mrjob/ is great to quickly get started on your local machine, basically all you need is a simple:
pip install mrjob
http://jsmapreduce.com/ -- in-browser mapreduce; in Python or Javascript; nothing to install
Check out Apache Spark . It is written in Java but it has also a Python API. You can try it locally on your machine and then, when you need it, you can easily distribute your computation over a cluster.
MockMR - https://github.com/sjtrny/mockmr
It's meant for educational use. Does not currently operate in parallel but accepts standard Python objects as IO.
So this was asked ages ago, but I worked on a full implementation of mapreduce over the weekend: remap.
https://github.com/gtoonstra/remap
Pretty easy to install with minimal dependencies, if all goes well you should be able to run a test run in 5 minutes.
The entire processing pipeline works, but submitting and monitoring jobs is still being worked on.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.