简体   繁体   中英

How to submit Flink job to a remote YARN cluster?

I installed Apache Hadoop on 4 node by using Apache Ambari. And I wrote a simple job with Apache Flink. I want to submit this job to my YARN cluster. But Flink needs to YARN configuration files on local machine (core-site.xml, yarn-site.xml etc.). So if I don't misunterstand, there are two manuel way

  1. start flink job on ResourceManager node (to find config files)
  2. download config files from ResourceManager to local.

I think, these two ways are not very well. How can I submit my job to remote YARN cluster. is there a suitable way?

In the Hadoop/YARN world, you always need the configuration files on your client machine. So, you would need to fetch them locally. However, you usually need some of them and not all. In most of the cases, it should be enough to have hdfs-site.xml, core-site.xml and yarn-site.xml - if I am not mistaken. To be on the safe side, copy all of them in a local directory.

Then configure the following parameter in your flink-conf.yaml file on the machine, which will play the role of client, aka. you will launch your job from.

fs.hdfs.hadoopconf: path_to_hadoop_conf_dir

Then you should be able to launch a YARN job by telling the flink tool to use a yarn-master as job manager.

flink run -m yarn-cluster -yn <num_task_managers> -yjm <job_manager_memory> -ytm <task_manager_memory -c <main_class> <jar>

If you have configured the above memory parameters in your flink-conf.yaml, it should be possible to launch the job with the default values by omitting all those verbose parameter

flink run -m yarn-cluster -n <num_task_managers> -c <main_class> <jar>

As a quick test, you could try to launch a Scala shell on YARN.

start-scala-shell.sh yarn -n <num_task_managers> -nm test_job

I believe it's more a question about starting your YARN client that Flink happens to be than Flink itself.

I know very little about Flink but given my knowledge about Spark on YARN I can say you can only do 2, ie download config files to the machine you're going to use to start your Flink application. You could also use an edge machine in the YARN cluster as the machine to deploy your application from.

Again, I believe it's more a question about how to do application deployment to YARN.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM