MapReduce Sort By Python Tuples Numerically

Question

I'm working wth Python tuples and have a text file that looks like

(1,value1)
(2,value2)
(3,value3)
...
(100,value100)

How can I configure my MapReduce job to sort by the first key in the tuple as an integer?

My reduce job needs to output a sorted list of tuples, so I don't want to start replacing parenthesis and commas with tabs. That's going to be a pain to translate back into tuples.

I'm running my hadoop job from bash with the following parameters:

hadoop jar /usr/local/Cellar/hadoop/2.*/libexec/share/hadoop/tools/lib/hadoop-streaming-2*.jar 
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-D mapreduce.partition.keycomparator.options=-n 
-reducer reducer.py 
-input tuples.txt
-output sortedtuples

Thanks

Answer 1

If your values are integers you can use the eval function to avoid the removing/converting tasks. Here is a working example of what you want:

f = open('sourceFile.txt')
L = f.readlines() 
f.close()
MyList = sorted( map(eval,L) , key=lambda x: x[0])

MapReduce Sort By Python Tuples Numerically

Question

1 answers

solution1
1 2016-05-01 22:26:58

MapReduce Sort By Python Tuples Numerically

Question

1 answers

solution1 1 2016-05-01 22:26:58

solution1
1 2016-05-01 22:26:58