简体   繁体   English

如何将Flink作业提交到远程YARN群集?

[英]How to submit Flink job to a remote YARN cluster?

I installed Apache Hadoop on 4 node by using Apache Ambari. 我使用Apache Ambari在4个节点上安装了Apache Hadoop。 And I wrote a simple job with Apache Flink. 我用Apache Flink编写了一个简单的工作。 I want to submit this job to my YARN cluster. 我想将此工作提交到我的YARN群集。 But Flink needs to YARN configuration files on local machine (core-site.xml, yarn-site.xml etc.). 但是Flink需要在本地计算机上配置YARN配置文件(core-site.xml,yarn-site.xml等)。 So if I don't misunterstand, there are two manuel way 所以,如果我不误解,有两种方法

  1. start flink job on ResourceManager node (to find config files) 在ResourceManager节点上启动flink作业(以查找配置文件)
  2. download config files from ResourceManager to local. 将配置文件从ResourceManager下载到本地。

I think, these two ways are not very well. 我认为,这两种方式都不是很好。 How can I submit my job to remote YARN cluster. 如何将作业提交到远程YARN群集。 is there a suitable way? 有没有合适的方法?

In the Hadoop/YARN world, you always need the configuration files on your client machine. 在Hadoop / YARN世界中,您始终需要客户端计算机上的配置文件。 So, you would need to fetch them locally. 因此,您需要在本地获取它们。 However, you usually need some of them and not all. 但是,您通常需要其中一些而不是全部。 In most of the cases, it should be enough to have hdfs-site.xml, core-site.xml and yarn-site.xml - if I am not mistaken. 在大多数情况下,拥有hdfs-site.xml,core-site.xml和yarn-site.xml就足够了-如果我没有记错的话。 To be on the safe side, copy all of them in a local directory. 为了安全起见,请将所有副本都复制到本地目录中。

Then configure the following parameter in your flink-conf.yaml file on the machine, which will play the role of client, aka. 然后在计算机上的flink-conf.yaml文件中配置以下参数,该参数将充当客户端(也称为客户端)的角色。 you will launch your job from. 您将从这里开始工作。

fs.hdfs.hadoopconf: path_to_hadoop_conf_dir

Then you should be able to launch a YARN job by telling the flink tool to use a yarn-master as job manager. 然后,您应该能够通过告诉flink工具使用yarn-master作为作业管理器来启动YARN作业。

flink run -m yarn-cluster -yn <num_task_managers> -yjm <job_manager_memory> -ytm <task_manager_memory -c <main_class> <jar>

If you have configured the above memory parameters in your flink-conf.yaml, it should be possible to launch the job with the default values by omitting all those verbose parameter 如果在flink-conf.yaml中配置了上述内存参数,则可以通过省略所有这些详细参数来启动具有默认值的作业

flink run -m yarn-cluster -n <num_task_managers> -c <main_class> <jar>

As a quick test, you could try to launch a Scala shell on YARN. 作为快速测试,您可以尝试在YARN上启动Scala shell。

start-scala-shell.sh yarn -n <num_task_managers> -nm test_job

I believe it's more a question about starting your YARN client that Flink happens to be than Flink itself. 我相信Flink恰好比启动Flink本身是一个问题,有关启动您的YARN客户端。

I know very little about Flink but given my knowledge about Spark on YARN I can say you can only do 2, ie download config files to the machine you're going to use to start your Flink application. 我对Flink知之甚少,但是鉴于我对YARN上Spark的了解,我只能说只能做2次,即将配置文件下载到将用于启动Flink应用程序的计算机上。 You could also use an edge machine in the YARN cluster as the machine to deploy your application from. 您还可以将YARN群集中的边缘计算机用作从中部署应用程序的计算机。

Again, I believe it's more a question about how to do application deployment to YARN. 同样,我认为这是关于如何将应用程序部署到YARN的更多问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM