[英]launch spark/spring-boot job on yarn cluster with kerberos enable
I want to use spring-boot and spark together in a Yarn cluster with kerberos enable.我想使用 spring-boot 并在启用 kerberos 的 Yarn 集群中一起触发。 (I am sprint-boot newbie)
(我是 sprint-boot 新手)
my prerequisite:我的先决条件:
I can't use spark-submit, the application is launch this way:我不能使用 spark-submit,应用程序是以这种方式启动的:
java -jar <my_jar>
I build the jar with spring-boot-maven-plugin.我用 spring-boot-maven-plugin 构建了 jar。
Here is my simplified code:这是我的简化代码:
@SpringBootApplication public class A implements ApplicationRunner {
@Autowired
B b;
public static void main(String[] args) {
SpringApplication.run(A.class, args);
}
@Override
public void run(ApplicationArguments args) {
b.run();
}
}
my B class:我的 B class:
@Component
public class B{
public void run() {
SparkSession ss = createSparkSession();
Dataset<Row> csv = readCsvFromHDFS()
// business logic here
writeCsvToHdfs();
}
}
This work well on localhost with master set to local[*]
, the main problem is when i try to set the sparkSession master to Yarn.这在主机设置为
local[*]
的 localhost 上运行良好,主要问题是当我尝试将 sparkSession 主机设置为 Yarn 时。 My idea was to pass all parameters from spark-submit to my spark session to avoid using spark-submit.我的想法是将所有参数从 spark-submit 传递给我的 spark session 以避免使用 spark-submit。 My sparkSession is created with this way:
我的 sparkSession 是用这种方式创建的:
SparkSession.builder()
.master("yarn")
.appName("appName")
.config("HADOOP_CONF_DIR", "/usr/hdp/current/hadoop-client/conf")
.config("SPARK_CONF_DIR", "/usr/hdp/current/spark2-client/conf")
.config("spark.driver.cores", "5")
.config("spark.driver.memory", "1g")
.config("spark.executor.memory", "1g")
.config("spark.logConf", "true")
.config("spark.submit.deployMode", "client")
.config("spark.executor.cores", "5")
.config("spark.hadoop.yarn.resourcemanager.address", "XXXX:8050")
.config("spark.hadoop.yarn.resourcemanager.hostname", "XXXX")
.config("spark.hadoop.security.authentication", "kerberos")
.config("hadoop.security.authorization","true")
.getOrCreate()
at the moment my error is:目前我的错误是:
java.lang.IllegalStateException: Failed to execute ApplicationRunner
...
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
my kerberos ticket is valid before launching application.我的 kerberos 票证在启动应用程序之前是有效的。
I think that my core-site, hdfs-site, yan-site... is ignored because the SparkSession should be able to retrieve the information needed by himself.我认为我的core-site,hdfs-site,yan-site...被忽略了,因为SparkSession应该能够检索到自己需要的信息。
I try to export it but it change nothing:我尝试导出它,但它没有任何改变:
There is a better way to use spark + spring-boot + yarn + kerberos together and that respect my prerequisite?有更好的方法可以同时使用 spark + spring-boot + yarn + kerberos 并且尊重我的先决条件?
my version:我的版本:
Java 8 Java 8
HDP: 2.6.4 HDP:2.6.4
Spark: 2.3.2火花:2.3.2
Spring-boot: 2.3.0.RELEASE Spring-boot:2.3.0.RELEASE
There are several options to solve that有几个选项可以解决这个问题
UserGroupInformation
and run your main code in privileged contextUserGroupInformation
显式处理 keytab 并在特权上下文中运行您的主代码private String principal;
private File keytab;
public UserGroupInformation ugi() {
final org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, "Kerberos");
UserGroupInformation.setConfiguration(conf);
return UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab.getAbsolutePath());
}
And then接着
ugi.doAs(() -> {
// Start Spring context here
});
You can locate your jar via reflection and submit it through deploy.SparkSubmit
class, with provided keytab and principal.您可以通过反射找到您的 jar 并通过
deploy.SparkSubmit
class 提交,并提供密钥表和主体。
embeddedLaunchScript
in spring-boot-maven-plugin
Note: you would have to start it with ./app.jar
not java -jar app.jar
embeddedLaunchScript
in spring-boot-maven-plugin
Note: you would have to start it with ./app.jar
not java -jar app.jar
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>repackage</goal>
<goal>build-info</goal>
</goals>
</execution>
</executions>
<configuration>
<mainClass>your.Main</mainClass>
<embeddedLaunchScript>src/main/sh/spark-submit.sh</embeddedLaunchScript>
</configuration>
</plugin>
Where spark-submit.sh
is your own implementation of spark-submit similar to 2)其中
spark-submit.sh
是您自己的 spark-submit 实现,类似于 2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.