简体   繁体   中英

How to add hive auxiliary jars to Dataproc cluster

When you start a hive session in Dataproc you can add jars that live in a gcs bucket.
add jar gs://my-bucket/serde.jar;

I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property.

<property>
  <name>hive.aux.jars.path</name>
  <value>gs://my-bucket/serde.jar</value>
</property>

Then I get hit with this error when trying to start a hive session.
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://gs, expected: file:///

Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster?

*edit
Even after adding the below property and restarting hive I still get the same error.

  <property>
    <name>hive.exim.uri.scheme.whitelist</name>
    <value>hdfs,pfile,gs</value>
    <final>false</final>
  </property>

This is a known Hive bug ( HIVE-18871 ) - hive.aux.jars.path supports only local paths in Hive 3.1 and lower.

Workaround will be to use Dataproc initialization action that copies jars from GCS to the same local FS path on all Dataproc cluster nodes and specify this local path as a value of the hive.aux.jars.path property.

Update

HIVE-18871 fix was back ported to Dataproc 1.3+ images, so you can use GCS URIs in the hive.aux.jars.path property with new Dataproc images that have this fix.

I guess you also need to set property hive.exim.uri.scheme.whitelist to whitelist gcs uri.

So in your case, while creating a Dataproc cluster, set properties

hive.aux.jars.path = gs://my-bucket/serde.jar
hive.exim.uri.scheme.whitelist = hdfs,pfile,gs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM