简体   繁体   中英

How to Serialize object in hadoop (in HDFS)

I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.

To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)

try
        {
            FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
            ObjectOutputStream out = new ObjectOutputStream(fileOut);
            out.writeObject(hm);
            out.close();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

I got the following exception

java.io.FileNotFoundException: hashmap.ser (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
    at KMerIndex.createIndex(KMerIndex.java:121)
    at MyDriverClass.formRefIndex(MyDriverClass.java:717)
    at MyDriverClass.main(MyDriverClass.java:768)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?

Please try using SerializationUtils from Apache Commons Lang.

Below are methods

static Object   clone(Serializable object)  //Deep clone an Object using serialization.
static Object   deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object   deserialize(InputStream inputStream)  //Deserializes an Object from the specified stream.
static byte[]   serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.

While storing in to HDFS you can store byte[] which was returned from serialize. While getting the Object you can type cast to corresponding object for ex: File object and can get it back.

In my case, I was storing one hashmap in Hbase column, I retrieved it back, in my mapper method as Hashmap as it is.. and was successful in that.

Surely, you can also do that in the same way...

Another thing is You can also Use Apache Commons IO refer this ( org.apache.commons.io.FileUtils ); but later you need to copy this file to HDFS. since you wanted HDFS as datastore.

FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);

Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM