简体   繁体   中英

Running Java Jar with included config via maven on flink yarn cluster

I am using flink in a maven/java project and need to include my configs internally in the created jar.

So, I have added the following in my pom file. This includes all my yml configs (located in src/main/resources folder) in the jar, whose name I will pass as argument while executing.

    <resources>
        <resource>
            <directory>src/main/resources</directory>
            <includes>
                <include>**/*.yml</include>
            </includes>
        </resource>
    </resources>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.4.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <finalName>${project.artifactId}-${project.version}</finalName>
                        <shadedArtifactAttached>true</shadedArtifactAttached>
                        <transformers>
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.exmaple.MyApplication</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>

The following main class code receives an arg based on which I decide what config to pick from resource, read(using snakeyaml) and use.

public static void main(String[] args) throws Exception {
    final ParameterTool parameterTool = ParameterTool.fromArgs(args);

    ClassLoader classLoader = MyApplication.class.getClassLoader();
    Yaml yaml = new Yaml();

    String filename = parameterTool.getRequired("configFilename");

    InputStream in  = classLoader.getSystemResourceAsStream(filename);
    MyConfigClass = yaml.loadAs(in, MyConfigClass.class);

    ...

}

mvn clean install creates "my-shaded-jar.jar"

which I execute using command

java -jar /path/to/my-shaded-jar.jar --configFilename filename

It works on multiple systems, if I share the jar with others.

However I am facing issue, when I try to run the same jar in a yarn cluster on Hadoop, using the following command:-

HADOOP_CLASSPATH=`hadoop classpath` HADOOP_CONF_DIR=/etc/hadoop/conf ./flink-1.6.2/bin/flink run -m yarn-cluster -yd -yn 5 -ys 30 -yjm 10240 -ytm 10240 -yst -ynm some-job-name -yqu queue-name ./my-shaded-jar.jar --configFilename filename

I am getting following Error:

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
    at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
    at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)
    at org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:78)
    at org.apache.flink.client.program.PackagedProgramUtils.createJobGraph(PackagedProgramUtils.java:120)
    at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:238)
    at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:216)
    at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1053)
    at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1129)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
    at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
    at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1129)
Caused by: org.yaml.snakeyaml.error.YAMLException: java.io.IOException: Stream closed
    at org.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:200)
    at org.yaml.snakeyaml.reader.StreamReader.<init>(StreamReader.java:60)
    at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:444)
    at com.example.MyApplication.main(MyApplication.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
    ... 13 more
Caused by: java.io.IOException: Stream closed
    at java.io.PushbackInputStream.ensureOpen(PushbackInputStream.java:74)
    at java.io.PushbackInputStream.read(PushbackInputStream.java:166)
    at org.yaml.snakeyaml.reader.UnicodeReader.init(UnicodeReader.java:90)
    at org.yaml.snakeyaml.reader.UnicodeReader.read(UnicodeReader.java:122)
    at java.io.Reader.read(Reader.java:140)
    at org.yaml.snakeyaml.reader.StreamReader.update(StreamReader.java:184)

Why does my solution works on any normal linux/mac systems, however the same jar with same args fails when running with flink run command on yarn cluster. Is there a difference between how we generally execute jars and how yarn does it.

Any help appreciated.

Replace classLoader.getSystemResourceAsStream(filename) with classLoader.getResourceAsStream(filename) .

  1. java.lang.ClassLoader#getSystemResourceAsStream locates the resource through the system class loader, which is typically used to start the application.

  2. java.lang.ClassLoader#getResourceAsStream will first search the parent class loader. That failing, it will invoke findResource of the current class loader.

To avoid dependency conflicts, classes in Flink applications are divided into two domains [1], which is also applied to Flink client, eg CliFrontend .

The Java Classpath includes the classes of Apache Flink and its core dependencies.
The Dynamic User Code includes the classes (and resources) of user jars.

So in order to find your "config file", which is packaged in your jar file, we should use the user code class loader (you can find the details of userCodeClassLoader in org.apache.flink.client.program.PackagedProgram ), instead of the system classloader.

  1. https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM