简体繁体中英

matrix computation using hadoop mapreduce

原文 2013-12-19 17:07:24 0 1 java/ hadoop/ mapreduce

I have a matrix with around 10000 rows. I wrote a code that should take one row in each iteration, do some long matrix computations and return one double number per each row of matrix. Since the number of operation per each row is too much, running the code takes long time. I'm thinking to implement it using MapReduce but I'm not sure it is possible or not. The main idea is splitting matrix rows into different nodes, running the jobs independently and combining the outputs together and returns an a list of numbers. Based on my understanding, just a mapper can do this job. Am I right? Is it possible? or any better idea? Thanks in advance. By the way the code is in Java.

1 answers

This seems possible - some points for consideration:

You might want to run an identity mapper (one which passes each input record to the reducer) and do the row calculation in the reducer. Doing the calculation map-side will probably still cause all the calculations to be done on a single node (it's feasible that your 10000 row matrix is smaller than the input split size).

You'll want to run a large number of reducers to ensure the job is parallellized across your cluster nodes. The default partitioner will handle sending the input rows to different reducers (assuming your rows are not fixed width, in which case you should run a custom mapper that uses a counter as the output keys, instead of the default byte offset of the input row).

To bring all the results back together you'll need to run a second MR job with a single reducer

Matrix Multiplication of large data in Hadoop mapreduce

test JNI on Hadoop using MapReduce

Hadoop MapReduce: using MapWritable as a key

Hadoop mapreduce on MongoDB using java

Hadoop and MapReduce

Hadoop - MapReduce

processing zipped xml files in hadoop using mapreduce

Using Elasticsearch DSL Query with Hadoop Mapreduce

Run java MapReduce using Hadoop Streaming API

Comparing Two Excel files using Hadoop Mapreduce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Matrix Multiplication of large data in Hadoop mapreduce test JNI on Hadoop using MapReduce Hadoop MapReduce: using MapWritable as a key Hadoop mapreduce on MongoDB using java Hadoop and MapReduce Hadoop - MapReduce processing zipped xml files in hadoop using mapreduce Using Elasticsearch DSL Query with Hadoop Mapreduce Run java MapReduce using Hadoop Streaming API Comparing Two Excel files using Hadoop Mapreduce

Related Tags

matrix computation using hadoop mapreduce

Question

1 answers

solution1 1 ACCPTED 2013-12-20 01:15:43

solution1
1 ACCPTED 2013-12-20 01:15:43