简体   繁体   中英

MapReduce Sort By Python Tuples Numerically

I'm working wth Python tuples and have a text file that looks like

(1,value1)
(2,value2)
(3,value3)
...
(100,value100)

How can I configure my MapReduce job to sort by the first key in the tuple as an integer?

My reduce job needs to output a sorted list of tuples, so I don't want to start replacing parenthesis and commas with tabs. That's going to be a pain to translate back into tuples.

I'm running my hadoop job from bash with the following parameters:

hadoop jar /usr/local/Cellar/hadoop/2.*/libexec/share/hadoop/tools/lib/hadoop-streaming-2*.jar 
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
-D mapreduce.partition.keycomparator.options=-n 
-reducer reducer.py 
-input tuples.txt
-output sortedtuples

Thanks

If your values are integers you can use the eval function to avoid the removing/converting tasks. Here is a working example of what you want:

f = open('sourceFile.txt')
L = f.readlines() 
f.close()
MyList = sorted( map(eval,L) , key=lambda x: x[0])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM