使用Spark检索受Kerberos保护的HDFS文件

Question

I am having issues setting up my spark environment to read from a kerberized HDFS file location. 我在设置我的Spark环境以从kerberized HDFS文件位置读取时遇到问题。

At the moment I have tried to do the following: 目前，我尝试执行以下操作：

def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match {
case None => code
case Some(u) => u.doAs(new PrivilegedExceptionAction[T] {
  override def run(): T = code
})
}
val sparkConf = defaultSparkConf.setAppName("file-test").setMaster("yarn-client")

val sc = ugiDoAs(ugi) {new SparkContext(conf)} 

val file = sc.textFile("path")

It fails at the point of creating the Spark Context, with the following error: 它在创建Spark上下文时失败，并显示以下错误：

Exception in thread "main" org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)

Has anyone got a simple example on how to allow spark to connect to a kerberized HDFS location? 有没有人有一个简单的示例说明如何允许火花连接到以kerberized格式存储的HDFS？

I know that spark needs to be in Yarn mode to be able to make it work, but the login method does not seem to be working in this respect. 我知道spark需要处于Yarn模式才能使其工作，但是登录方法在这方面似乎不起作用。 Although I know that the User Group Information (ugi) object is valid as I have used it to connect to ZK in the same object and HBase. 尽管我知道用户组信息（ugi）对象是有效的，因为我已使用它在同一对象和HBase中将其连接到ZK。

Answer 1

Confirm conf/spark-env.sh is configured or: 确认配置了conf / spark-env.sh或：

export HADOOP_CONF_DIR=/etc/hadoop/conf

This must point to the client configs for your cluster. 这必须指向集群的客户端配置。

Answer 2

The error implies that the client is trying to talk to HDFS unauthenticated and that's being rejected. 该错误表明客户端试图与未经认证的HDFS进行通信，并且被拒绝。 make sure the UGI really is secure by logging it, and do some basic hadoop filesystem code before going to spark; 通过记录来确保UGI确实是安全的，并在触发之前执行一些基本的hadoop文件系统代码； that should make it easier to track down 这样应该更容易追踪

使用Spark检索受Kerberos保护的HDFS文件

问题描述

2 个解决方案

解决方案1
0 2016-03-24 01:41:57

解决方案2
0 2016-10-20 15:50:05

使用Spark检索受Kerberos保护的HDFS文件

问题描述

2 个解决方案

解决方案1 0 2016-03-24 01:41:57

解决方案2 0 2016-10-20 15:50:05

解决方案1
0 2016-03-24 01:41:57

解决方案2
0 2016-10-20 15:50:05