简体   繁体   English

如何将 hive 辅助 jars 添加到 Dataproc 集群

[英]How to add hive auxiliary jars to Dataproc cluster

When you start a hive session in Dataproc you can add jars that live in a gcs bucket.当您在 Dataproc 中启动 hive session 时,您可以添加位于 gcs 存储桶中的 jars。
add jar gs://my-bucket/serde.jar;

I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property. I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property.

<property>
  <name>hive.aux.jars.path</name>
  <value>gs://my-bucket/serde.jar</value>
</property>

Then I get hit with this error when trying to start a hive session.然后我在尝试启动 hive session 时遇到此错误。
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://gs, expected: file:///

Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster? Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster?

*edit *编辑
Even after adding the below property and restarting hive I still get the same error.即使添加了以下属性并重新启动 hive 我仍然得到同样的错误。

  <property>
    <name>hive.exim.uri.scheme.whitelist</name>
    <value>hdfs,pfile,gs</value>
    <final>false</final>
  </property>

This is a known Hive bug ( HIVE-18871 ) - hive.aux.jars.path supports only local paths in Hive 3.1 and lower. This is a known Hive bug ( HIVE-18871 ) - hive.aux.jars.path supports only local paths in Hive 3.1 and lower.

Workaround will be to use Dataproc initialization action that copies jars from GCS to the same local FS path on all Dataproc cluster nodes and specify this local path as a value of the hive.aux.jars.path property.解决方法是使用Dataproc 初始化操作,将 jars 从 GCS 复制到所有 Dataproc 集群节点上的相同本地 FS 路径,并将此本地路径指定为hive.aux.jars.path的值。

Update更新

HIVE-18871 fix was back ported to Dataproc 1.3+ images, so you can use GCS URIs in the hive.aux.jars.path property with new Dataproc images that have this fix. HIVE-18871 修复已移植回 Dataproc 1.3+ 映像,因此您可以将hive.aux.jars.path属性中的 GCS URI 与具有此修复的新 Dataproc 映像一起使用。

I guess you also need to set property hive.exim.uri.scheme.whitelist to whitelist gcs uri.我想您还需要将属性hive.exim.uri.scheme.whitelist设置为白名单 gcs uri。

So in your case, while creating a Dataproc cluster, set properties因此,在您的情况下,在创建 Dataproc 集群时,请设置属性

hive.aux.jars.path = gs://my-bucket/serde.jar
hive.exim.uri.scheme.whitelist = hdfs,pfile,gs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将文件夹中包含的所有jar添加到配置单元? - How to add all jars contained in a folder to hive? 如何在 hive shell 应用程序中添加 jars - How to add jars on hive shell in a java application 将Hive作业提交到Dataproc集群时,如何执行gcp存储桶中的Hive查询列表(在我的情况下为gs:/hive/hive.sql”) - How to execute list of hive queries which is in gcp storage bucket (in my case gs:/hive/hive.sql") while submitting hive job to dataproc cluster 如何在不重新启动hadoop集群的情况下将jar添加到类路径并获得效果? - How to add jars into the classpath and get effected without restarting the hadoop cluster? 如何访问 dataproc 集群的 MasterNode 内的 mysql? - How to access mysql inside MasterNode of the dataproc cluster? 如何恢复 dataproc 集群中已删除的主节点? - How to recover deleted Master Node in dataproc cluster? 为Hive客户端添加带有辅助jar的路径 - Add path with aux jars for Hive client 如何在Hive中包含jar(Amazon Hadoop env) - How to include jars in Hive (Amazon Hadoop env) 如何从机器内部在Google Cloud Dataproc上运行Hive? - How to run hive on google cloud dataproc from within the machine? HUE 与 Dataproc 集群的集成 - HUE integration with Dataproc cluster
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM