简体繁体中英

Create custom writable key/value type in python for Hadoop Map Reduce?

原文 2018-08-01 23:23:19 6 1 python/ hadoop/ mapreduce

I have worked on Hadoop MR for quite some time and I have created and used custom(extension) Writable classes including MapWritable . Now I am required to translate the same MR that I have written in Java to Python. I do not have experience in python and am now exploring the various libraries for the same. I am looking into some options like Pydoop and Mrjob . However, I want to know if these libraries contain the option to create similar custom Writable classes and how to create them. If not, what possible alternatives exist to do the same?

1 answers

In Pydoop, explicit support for custom Hadoop types is still WIP . In other words, right now we're not making things easy for the user, but it can be done with a bit of work. A couple of pointers:

Pydoop already includes custom Java code, auto-installed together with the Python package as pydoop.jar . We pass this extra jar to Hadoop as needed. Adding more Java code is a matter of placing the source in src/ and listing it in JavaLib.java_files in setup.py
On the Python side, you need deserializers for the new types. See for instance LongWritableDeserializer in pydoop.mapreduce.pipes .

Hope this helps.

unable to run map reduce using python in Hadoop?

In python map-reduce, how to print the key with max value?

How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python))

Map-Reduce/Hadoop sort by integer value (using MRJob)

Cross Product in Map Reduce using Hadoop Streaming and Python

Hadoop Streaming Job - python stuck at map 0% reduce 0% in CDH4.5

Map-Reduce to solve Matrix multiplication in python with Hadoop

hadoop - Map reduce on multiple cluster

Inverted list in Map Reduce Hadoop

How to reduce size of my custom python type?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question unable to run map reduce using python in Hadoop? In python map-reduce, how to print the key with max value? How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python)) Map-Reduce/Hadoop sort by integer value (using MRJob) Cross Product in Map Reduce using Hadoop Streaming and Python Hadoop Streaming Job - python stuck at map 0% reduce 0% in CDH4.5 Map-Reduce to solve Matrix multiplication in python with Hadoop hadoop - Map reduce on multiple cluster Inverted list in Map Reduce Hadoop How to reduce size of my custom python type?

Related Tags

Create custom writable key/value type in python for Hadoop Map Reduce?

Question

1 answers

solution1 0 ACCPTED 2018-08-03 09:55:50

solution1
0 ACCPTED 2018-08-03 09:55:50