简体繁体 English

批次联结作业导致纱线簇性能低下

[英]Low performance of yarn cluster with batch flink jobs

原文 2017-06-23 07:57:19 3 1 yarn/ hadoop2/ apache-flink/ flink-streaming

I'm playing with flink on yarn for testing purposes I have the following setup: 我出于测试目的在纱线上玩flink，我具有以下设置：

3 machines on aws (32 cores and 64 GB of memory) 3台AWS机器（32核和64 GB内存）

I installed Hadoop 2 with hdfs and yarn services manually (without using EMR). 我手动安装了带有hdfs和yarn服务的Hadoop 2（不使用EMR）。

Machine #1 runs HDFS - (NameNode & SeconderyNameNode) and YARN - (resourcemanager) , defined in masters file ＃1机器运行在主文件中定义的HDFS-（NameNode＆SeconderyNameNode）和YARN-（resourcemanager）

Machine #2 runs HDFS - (datanode) and YARN - (nodemanager) , definded in slaves file ＃2机器运行HDFS-（datanode）和YARN-（nodemanager），在slaves文件中定义

Machine #3 runs HDFS - (datanode) and YARN - (nodemanager) , defined in slaves file ＃3机器运行HDFS-（datanode）和YARN-（nodemanager），在slaves文件中定义

I want to submit Apache flink job that reads about 20GB of logs from hdfs process them and than store the result in cassandra 我想提交Apache flink作业，该作业从hdfs读取大约20GB的日志，然后对其进行处理，然后将结果存储在cassandra中

The problem is that i think i'm doing wrong because the job takes quite a lot of time about an hour, and i think it's not very optimized. 问题是我认为我做错了，因为这项工作花费大量时间，大约一个小时，而且我认为它不是很优化。

i running flink with the following command: 我使用以下命令运行flink：

./flink-1.3.0/bin/flink run -yn 2 -ys 30 -yjm 7000 -ytm 8000 -m yarn-cluster /home/ubuntu/reports_script-1.0-SNAPSHOT.jar ./flink-1.3.0/bin/flink运行-yn 2 -ys 30 -yjm 7000 -ytm 8000 -m yarn-cluster /home/ubuntu/reports_script-1.0-SNAPSHOT.jar

and i'm seeing on flink logs that there are 60 task slots in use, but when i'm looking at yarn page i'm seeing very low usage of vcores and memory 而且我在flink日志上看到有60个任务槽正在使用中，但是当我查看yarn页面时，我看到vcore和内存的使用率非常低

Hadoop yarn page Hadoop纱线页面

what am i doing wrong? 我究竟做错了什么？

1 个解决方案

A few things to look out for: 需要注意的几件事：

The default value for the number of vcores per TaskManager container is one. 每个TaskManager容器的vcore数量的默认值为1。 To increase that, use the yarn.containers.vcores parameter. 要增加它，请使用yarn.containers.vcores参数。 Unless you use a container executor that enforces that the container uses only vcore many CPU cores, it may not make a difference at all to the job (and only looks weird in the YARN UI). 除非您使用容器执行程序来强制容器仅使用vcore使用许多CPU内核，否则它可能对工作完全没有影响（并且在YARN UI中看起来很奇怪）。
Giving 7GB memory to a TaskManager means it will actually get a JVM heap of around 5.2 GB, because some "cutoff" is taken for the JVM. 为TaskManager提供7GB内存意味着它实际上将获得约5.2 GB的JVM堆，因为对JVM采取了一些措施。 Having 5.3GB for 30 slots means about 170 MBs of memory per slot. 5.3GB的30个插槽意味着每个插槽约170 MB的内存。 That works, buts is actually not a lot. 那行得通，但是实际上并不很多。
Check the Flink web UI to make sure your job does run with the proper parallelism. 检查Flink Web UI，以确保您的作业确实以适当的并行度运行。 You can also check where (which operation) the time goes. 您还可以检查时间在哪里（执行哪个操作）。