Oozie Spark行动因kerberos环境而失败

Question

I am running a spark job through oozie spark action. 我正在通过oozie spark动作来完成一项火花工作。 The spark job uses hivecontext to perform some requirement. spark作业使用hivecontext来执行一些要求。 The cluster is configured with kerberos. 群集配置有kerberos。 When I submit the job using spark-submit form console, it is running successfully. 当我使用spark-submit表单控制台提交作业时，它已成功运行。 But when I run the job from oozie, ending up with the following error. 但是当我从oozie运行这个工作时，最终会出现以下错误。

18/03/18 03:34:16 INFO metastore: Trying to connect to metastore with URI thrift://localhost.local:9083
    18/03/18 03:34:16 ERROR TSaslTransport: SASL negotiation failure
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
            at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
            at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)

workflow.xml workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="workflow">
   <start to="analysis" />
   <!-- Bash script to do the spark-submit. The version numbers of these actions are magic. -->
   <action name="Analysis">
      <spark xmlns="uri:oozie:spark-action:0.1">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <master>${master}</master>
         <name>Analysis</name>
         <class>com.demo.analyzer</class>
         <jar>${appLib}</jar>
         <spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
      </spark>
      <ok to="sendEmail" />
      <error to="fail" />
   </action>
   <action name="sendEmail">
      <email xmlns="uri:oozie:email-action:0.1">
         <to>${emailToAddress}</to>
         <subject>Output of workflow ${wf:id()}</subject>
         <body>Results from line count: ${wf:actionData('shellAction')['NumberOfLines']}</body>
      </email>
      <ok to="end" />
      <error to="end" />
   </action>
   <!-- You wish you'd ever get Oozie errors. -->
   <kill name="fail">
      <message>Bash action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
   </kill>
   <end name="end" />
</workflow-app>

Do I need to configure anything related to Kerberos in workflow.xml ?. 我是否需要在workflow.xml中配置与Kerberos相关的任何内容？ Am I missing anything here. 我在这里遗漏了什么。

Any help appreciated. 任何帮助赞赏。

Thanks in advance. 提前致谢。

Answer 1

You need to add, hcat credentials for thrift uri in oozie workflow. 您需要在oozie工作流程中为thrift uri添加hcat凭据。 This will enable successful authentication of metastore forthe thrift URI using Kerberos. 这将使用Kerberos成功验证thrift URI的Metastore。

Add, below credentials tag in oozie workflow. 在oozie工作流程中添加以下凭据标记。

<credentials>
    <credential name="credhive" type="hcat">
        <property>
            <name>hcat.metastore.uri</name>
            <value>${thrift_uri}</value>
        </property>
        <property>
            <name>hcat.metastore.principal</name>
            <value>${principal}</value>
        </property>
    </credential>
</credentials>

And provide the credentials to the spark action as below: 并提供spark动作的凭据，如下所示：

<action name="Analysis" cred="credhive">
      <spark xmlns="uri:oozie:spark-action:0.1">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <master>${master}</master>
         <name>Analysis</name>
         <class>com.demo.analyzer</class>
         <jar>${appLib}</jar>
         <spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
      </spark>
      <ok to="sendEmail" />
      <error to="fail" />
   </action>

The thrift_uri and principal can be found in hive-site.xml . thrift_uri和principal可以在hive-site.xml找到。 thrift_uri will be set in the hive-site.xml property: thrift_uri将在hive-site.xml属性中设置：

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xxxxxx:9083</value>
  </property>

Also, principal will be set in hive-site.xml property: 另外，principal将在hive-site.xml属性中设置：

 <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>hive/_HOST@domain.COM</value>
  </property>

Answer 2

Upload your keytab in the server then refer this keytab file and credential as parameters in the spark-opts in your workflow. 在服务器中上传密钥表，然后将此密钥表文件和凭证作为工作流中spark-opts中的参数引用。 Let me know if it works or not. 让我知道它是否有效。 Thanks. 谢谢。

<spark-opts>--keytab nagendra.keytab --principal "nagendra@domain.com"
 --jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory
 ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>

Oozie Spark行动因kerberos环境而失败

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-03-21 13:55:18

解决方案2
0 2018-03-19 16:45:05

Oozie Spark行动因kerberos环境而失败

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-03-21 13:55:18

解决方案2 0 2018-03-19 16:45:05

解决方案1
4 已采纳 2018-03-21 13:55:18

解决方案2
0 2018-03-19 16:45:05