简体繁体中英

Is there a simple Python map-reduce framework that uses the regular filesystem?

原文 2013-04-18 21:04:42 8 6 python/ mapreduce

I have a few problems which may apply well to the Map-Reduce model. I'd like to experiment with implementing them, but at this stage I don't want to go to the trouble of installing a heavyweight system like Hadoop or Disco.

Is there a lightweight Python framework for map-reduce which uses the regular filesystem for input, temporary files, and output?

6 answers

A Coursera course dedicated to big data suggests using these lightweight python Map-Reduce frameworks:

To get you started very quickly, try this example:

https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2

(hint: for [server address] in this example use localhost)

http://pythonhosted.org/mrjob/ is great to quickly get started on your local machine, basically all you need is a simple:

pip install mrjob

http://jsmapreduce.com/ -- in-browser mapreduce; in Python or Javascript; nothing to install

Check out Apache Spark . It is written in Java but it has also a Python API. You can try it locally on your machine and then, when you need it, you can easily distribute your computation over a cluster.

MockMR - https://github.com/sjtrny/mockmr

It's meant for educational use. Does not currently operate in parallel but accepts standard Python objects as IO.

So this was asked ages ago, but I worked on a full implementation of mapreduce over the weekend: remap.

https://github.com/gtoonstra/remap

Pretty easy to install with minimal dependencies, if all goes well you should be able to run a test run in 5 minutes.

The entire processing pipeline works, but submitting and monitoring jobs is still being worked on.

Celery for Map-Reduce, or other alternatives in Python?

In python map-reduce, how to print the key with max value?

Lazy boolean evaluation in Python when using Map-Reduce

Implementing ARIMA or Holt Winter's using Map-Reduce in Python

Map-Reduce to solve Matrix multiplication in python with Hadoop

Azure HDInsights Issue With Hive/Python Map-Reduce

Map-reduce functional outline

Using a simple map-reduce to list all keys in a bucket vs. bucket.get_keys()?

Map-Reduce input split not working as expected

summing nested iterables with map-reduce/itertools

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Celery for Map-Reduce, or other alternatives in Python? In python map-reduce, how to print the key with max value? Lazy boolean evaluation in Python when using Map-Reduce Implementing ARIMA or Holt Winter's using Map-Reduce in Python Map-Reduce to solve Matrix multiplication in python with Hadoop Azure HDInsights Issue With Hive/Python Map-Reduce Map-reduce functional outline Using a simple map-reduce to list all keys in a bucket vs. bucket.get_keys()? Map-Reduce input split not working as expected summing nested iterables with map-reduce/itertools

Related Tags

Is there a simple Python map-reduce framework that uses the regular filesystem?

Question

6 answers

solution1
11 2013-04-24 08:29:49

solution2
5 2013-11-27 22:33:08

solution3
3 2014-02-08 20:54:44

solution4
1 2014-02-10 18:35:12

solution5
1 2018-06-04 05:09:08

solution6
0 2015-06-25 11:00:42

Is there a simple Python map-reduce framework that uses the regular filesystem?

Question

6 answers

solution1 11 2013-04-24 08:29:49

solution2 5 2013-11-27 22:33:08

solution3 3 2014-02-08 20:54:44

solution4 1 2014-02-10 18:35:12

solution5 1 2018-06-04 05:09:08

solution6 0 2015-06-25 11:00:42

solution1
11 2013-04-24 08:29:49

solution2
5 2013-11-27 22:33:08

solution3
3 2014-02-08 20:54:44

solution4
1 2014-02-10 18:35:12

solution5
1 2018-06-04 05:09:08

solution6
0 2015-06-25 11:00:42