简体   繁体   English

在不安全的 YARN 集群中运行 Spark 时访问安全的 Hive

[英]Access a secured Hive when running Spark in an unsecured YARN cluster

We have two cloudera 5.7.1 clusters, one secured using Kerberos and one unsecured.我们有两个 cloudera 5.7.1 集群,一个使用 Kerberos 保护,一个不安全。

Is it possible to run Spark using the unsecured YARN cluster while accessing hive tables stored in the secured cluster?在访问存储在安全集群中的 hive 表时,是否可以使用不安全的 YARN 集群运行 Spark? (Spark version is 1.6) (Spark 版本是 1.6)

If so, can you please provide some explanation on how can I get it configured?如果是这样,您能否提供一些有关如何配置它的解释?

Update:更新:

I want to explain a little the end goal behind my question.我想稍微解释一下我的问题背后的最终目标。 Our main secured cluster is heavily utilized and our jobs can't get enough resources to complete in a reasonable time.我们的主要安全集群被大量使用,我们的工作无法在合理的时间内获得足够的资源来完成。 In order to overcome this, we wanted to use resources from another unsecured cluster we have without needing to copy the data between the clusters.为了克服这个问题,我们希望使用来自另一个不安全集群的资源,而无需在集群之间复制数据。

We know it's not the best solution as the data locality level might not be optimal, however that's the best solution we can come up for now.我们知道这不是最好的解决方案,因为数据局部性级别可能不是最佳的,但这是我们现在可以提出的最佳解决方案。

Please let me know if you have any other solution as it seems like we can't achieve the above.如果您有任何其他解决方案,请告诉我,因为我们似乎无法实现上述目标。

If you run Spark in local mode , you can make it use an arbitrary set of Hadoop conf files -- ie core-site.xml , hdfs-site.xml , mapred-site.xml , yarn-site.xml , hive-site.xml copied from the Kerberized cluster.如果你在本地模式下运行 Spark,你可以让它使用任意一组 Hadoop conf 文件——即core-site.xmlhdfs-site.xmlmapred-site.xmlyarn-site.xmlhive-site.xml从 Kerberized 集群复制的hive-site.xml
So you can access HDFS on that cluster -- if you have a Kerberos ticket that grants you access to that cluster, of course.因此,您可以访问集群上的 HDFS——当然如果您有 Kerberos 票证授予您访问该集群的权限。

  export HADOOP_CONF_DIR=/path/to/conf/of/remote/kerberized/cluster
  kinit sylvestre@WORLD.COMPANY
  spark-shell --master local[*]

But in yarn-client or yarn-cluster mode , you cannot launch containers in the local cluster and access HDFS in the other.但是在yarn-client 或 yarn-cluster 模式下,您不能在本地集群中启动容器并在其他集群中访问 HDFS。

  • either you use the local core-site.xml that says that hadoop.security.authentication is simple , and you can connect to local YARN/HDFS要么你使用本地core-site.xmlhadoop.security.authentication is simple ,你可以连接到本地 YARN/HDFS
  • or you point to a copy of the remote core-site.xml that says that hadoop.security.authentication is kerberos , and you can connect to remote YARN/HDFS或者您指向远程core-site.xml的副本,该副本表示hadoop.security.authenticationkerberos ,并且您可以连接到远程 YARN/HDFS
  • but you cannot use the local, unsecure YARN and access the remote, secure HDFS但是你不能使用本地的、不安全的 YARN 并访问远程、安全的 HDFS

Note that with unsecure-unsecure or secure-secure combinations, you could access HDFS in another cluster, by hacking your own custom hdfs-site.xml to define multiple namespaces.请注意,使用 unsecure-unsecure 或 secure-secure 组合,您可以通过修改您自己的自定义hdfs-site.xml来定义多个命名空间访问另一个集群中的 HDFS。 But you are stuck to a single authentication model.但是您坚持使用单一的身份验证模型。
[edit] see the comment by Mighty Steve Loughran about an extra Spark property to access remote, secure HDFS from a local, secure cluster. [编辑]参见 Mighty Steve Loughran 关于额外 Spark 属性的评论,用于从本地安全集群访问远程安全 HDFS。

Note also that with DistCp you are stuck the same way -- except that there's a "cheat" property that allows you to go from secure to unsecure.另请注意,使用 DistCp 时,您会遇到同样的问题——除了有一个“作弊”属性可以让您从安全到不安全。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM