[英]How to add hive auxiliary jars to Dataproc cluster
When you start a hive session in Dataproc you can add jars that live in a gcs bucket.当您在 Dataproc 中启动 hive session 时,您可以添加位于 gcs 存储桶中的 jars。
add jar gs://my-bucket/serde.jar;
I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property. I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property.
<property>
<name>hive.aux.jars.path</name>
<value>gs://my-bucket/serde.jar</value>
</property>
Then I get hit with this error when trying to start a hive session.然后我在尝试启动 hive session 时遇到此错误。
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://gs, expected: file:///
Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster? Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster?
*edit *编辑
Even after adding the below property and restarting hive I still get the same error.即使添加了以下属性并重新启动 hive 我仍然得到同样的错误。
<property>
<name>hive.exim.uri.scheme.whitelist</name>
<value>hdfs,pfile,gs</value>
<final>false</final>
</property>
This is a known Hive bug ( HIVE-18871 ) - hive.aux.jars.path
supports only local paths in Hive 3.1 and lower. This is a known Hive bug ( HIVE-18871 ) -
hive.aux.jars.path
supports only local paths in Hive 3.1 and lower.
Workaround will be to use Dataproc initialization action that copies jars from GCS to the same local FS path on all Dataproc cluster nodes and specify this local path as a value of the hive.aux.jars.path
property.解决方法是使用Dataproc 初始化操作,将 jars 从 GCS 复制到所有 Dataproc 集群节点上的相同本地 FS 路径,并将此本地路径指定为
hive.aux.jars.path
的值。
HIVE-18871 fix was back ported to Dataproc 1.3+ images, so you can use GCS URIs in the hive.aux.jars.path
property with new Dataproc images that have this fix. HIVE-18871 修复已移植回 Dataproc 1.3+ 映像,因此您可以将
hive.aux.jars.path
属性中的 GCS URI 与具有此修复的新 Dataproc 映像一起使用。
I guess you also need to set property hive.exim.uri.scheme.whitelist
to whitelist gcs uri.我想您还需要将属性
hive.exim.uri.scheme.whitelist
设置为白名单 gcs uri。
So in your case, while creating a Dataproc cluster, set properties因此,在您的情况下,在创建 Dataproc 集群时,请设置属性
hive.aux.jars.path = gs://my-bucket/serde.jar
hive.exim.uri.scheme.whitelist = hdfs,pfile,gs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.