简体   繁体   English

通过flink yarn集群上的maven运行带有随附配置的Java Jar

[英]Running Java Jar with included config via maven on flink yarn cluster

I am using flink in a maven/java project and need to include my configs internally in the created jar. 我在maven / java项目中使用flink,需要在创建的jar中内部包含我的配置。

So, I have added the following in my pom file. 因此,我在pom文件中添加了以下内容。 This includes all my yml configs (located in src/main/resources folder) in the jar, whose name I will pass as argument while executing. 这包括jar中我所有的yml配置(位于src / main / resources文件夹中),我的名称将在执行时作为参数传递。

    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <includes>
                <include>**/*.yml</include>
            </includes>
        </resource>
    </resources>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.4.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <finalName>${project.artifactId}-${project.version}</finalName>
                        <shadedArtifactAttached>true</shadedArtifactAttached>
                        <transformers>
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.exmaple.MyApplication</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>

The following main class code receives an arg based on which I decide what config to pick from resource, read(using snakeyaml) and use. 下面的主类代码接收一个arg,我根据该arg决定从资源中选择要使用的配置,读取(使用snakeyaml)并使用。

public static void main(String[] args) throws Exception {
    final ParameterTool parameterTool = ParameterTool.fromArgs(args);

    ClassLoader classLoader = MyApplication.class.getClassLoader();
    Yaml yaml = new Yaml();

    String filename = parameterTool.getRequired("configFilename");

    InputStream in  = classLoader.getSystemResourceAsStream(filename);
    MyConfigClass = yaml.loadAs(in, MyConfigClass.class);

    ...

}

mvn clean install creates "my-shaded-jar.jar" mvn clean install创建“ my-shaded-jar.jar”

which I execute using command 我使用命令执行

java -jar /path/to/my-shaded-jar.jar --configFilename filename

It works on multiple systems, if I share the jar with others. 如果我与其他人共享jar,它可以在多个系统上工作。

However I am facing issue, when I try to run the same jar in a yarn cluster on Hadoop, using the following command:- 但是,当我尝试使用以下命令在Hadoop上的纱线群集中运行相同的jar时,我遇到了问题:

HADOOP_CLASSPATH=`hadoop classpath` HADOOP_CONF_DIR=/etc/hadoop/conf ./flink-1.6.2/bin/flink run -m yarn-cluster -yd -yn 5 -ys 30 -yjm 10240 -ytm 10240 -yst -ynm some-job-name -yqu queue-name ./my-shaded-jar.jar --configFilename filename

I am getting following Error: 我收到以下错误消息:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
    at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)
    at org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:78)
    at org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:120)
    at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:238)
    at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:216)
    at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1053)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1129)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
    at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1129)
Caused by: org.yaml.snakeyaml.error.YAMLException: java.io.IOException: Stream closed
    at org.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:200)
    at org.yaml.snakeyaml.reader.StreamReader.<init>(StreamReader.java:60)
    at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:444)
    at com.example.MyApplication.main(MyApplication.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
    ... 13 more
Caused by: java.io.IOException: Stream closed
    at java.io.PushbackInputStream.ensureOpen(PushbackInputStream.java:74)
    at java.io.PushbackInputStream.read(PushbackInputStream.java:166)
    at org.yaml.snakeyaml.reader.UnicodeReader.init(UnicodeReader.java:90)
    at org.yaml.snakeyaml.reader.UnicodeReader.read(UnicodeReader.java:122)
    at java.io.Reader.read(Reader.java:140)
    at org.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:184)

Why does my solution works on any normal linux/mac systems, however the same jar with same args fails when running with flink run command on yarn cluster. 为什么我的解决方案可以在任何普通的linux / mac系统上运行,但是在纱线群集上使用flink run命令运行时,具有相同args的同一个jar失败。 Is there a difference between how we generally execute jars and how yarn does it. 我们通常执行罐子的方式与执行纱线的方式之间有区别。

Any help appreciated. 任何帮助表示赞赏。

Replace classLoader.getSystemResourceAsStream(filename) with classLoader.getResourceAsStream(filename) . classLoader.getSystemResourceAsStream(filename)替换为classLoader.getResourceAsStream(filename)

  1. java.lang.ClassLoader#getSystemResourceAsStream locates the resource through the system class loader, which is typically used to start the application. java.lang.ClassLoader#getSystemResourceAsStream通过系统类加载器定位资源,该系统类加载器通常用于启动应用程序。

  2. java.lang.ClassLoader#getResourceAsStream will first search the parent class loader. java.lang.ClassLoader#getResourceAsStream将首先搜索父类加载器。 That failing, it will invoke findResource of the current class loader. 失败的话,它将调用当前类加载器的findResource

To avoid dependency conflicts, classes in Flink applications are divided into two domains [1], which is also applied to Flink client, eg CliFrontend . 为了避免依赖性冲突,Flink应用程序中的类分为两个域[1],这两个域也适用于Flink客户端,例如CliFrontend

The Java Classpath includes the classes of Apache Flink and its core dependencies. Java类路径包括Apache Flink的类及其核心依赖项。
The Dynamic User Code includes the classes (and resources) of user jars. 动态用户代码包括用户jar的类(和资源)。

So in order to find your "config file", which is packaged in your jar file, we should use the user code class loader (you can find the details of userCodeClassLoader in org.apache.flink.client.program.PackagedProgram ), instead of the system classloader. 因此,为了找到打包在jar文件中的“配置文件”,我们应该使用用户代码类加载器(您可以在org.apache.flink.client.program.PackagedProgram找到userCodeClassLoader的详细信息)系统类加载器。

  1. https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM