简体   繁体   中英

Using protobuf 3 with Hive and Elephant-Bird

I have a data pipeline that writes protobufs into an HDFS and now I need a way to query that data. I stumbled upon elephant-bird and hive and have been trying to get this solution up-an-running for a day now.

Here are the steps that I took:

1.) Installed Hadoop 2.7.3, Hive 2.1.1 and Protobuf 3.0.0

2.) Cloned Elephant-Bird 4.16 and built was successful

3.) Start hive and add the core, hive and hadoop-compat jars

4.) Generate java class for .proto file; package with protobuf-java-3.0.0.jar and add to hive

5.) Add protobuf-java-3.0.0.jar to hive

After all of this I execute a create external command as follows:

create external table tracks
    row format serde 
        "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
    with serdeproperties (
        "serialization.class"="protobuf.TracksProtos$Env")
    stored as
        inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
        OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
    LOCATION '/tracks/';

And I receive this message in the logs:

2017-10-26T17:36:30,838 ERROR [main] util.Protobufs: Error invoking method getDescriptor in class class protobuf.TracksProtos$Env
java.lang.reflect.InvocationTargetException
.....
.....
.....
Caused by: java.lang.NoSuchMethodError: com.google.protobuf.Descriptors$Descriptor.getOneofs()Ljava/util/List;

I know this is not true because I can list jars from hive and see the all were installed and when I expand them I can see the classes that they believe do not exist.

If I look under $HIVE_HOME/lib I see that it is using protobuf-java-2.5.0.jar. I am wondering if this is the cause for this error and my options to correct it.

Thoughts ?

I was able to resolve this issue by downloading the Hive source and compiling using the following command:

mvn -Dprotobuf.version=3.0.0 -Pdist clean package

This allowed me to use Hive with protobuf-3.0.0. Then, I needed to re-compile elephant-bird against my new installation of Hive.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM