简体   繁体   中英

Google Dataproc node idle

One of my nodes in my Dataproc cluster is always idle when running a spark job. I've tried deleting and recreating the cluster ect. but it always has one idle node.

The reason seems to be indicated by these three lines from the log that come up every few seconds:

Trying to fulfill reservation for application application_1476080745886_0001 on node: cluster-4-w-0.c.xxxx.internal:39080
Reserved container  application=application_1476080745886_0001 resource=<memory:4608, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:25600, vCores:6>, usedCapacity=0.90909094, absoluteUsedCapacity=0.90909094, numApps=1, numContainers=6 usedCapacity=0.90909094 absoluteUsedCapacity=0.90909094 used=<memory:25600, vCores:6> cluster=<memory:28160, vCores:40>
Skipping scheduling since node cluster-4-w-0.c.xxxx.internal:39080 is reserved by application appattempt_1476080745886_0001_000001

Node cluster-4-w-0.c.xxxx.internal is the idle one. Why is a node reserved by appattempt_1476080745886_0001_000001 and unusable as an executor?

Since the app attempt, matches the Application ID of your Spark Application, I believe the app attempt is Spark's YARN AppMaster. By default, Spark AppMasters have the (somewhat excessive) same footprint as Executors (half a node). So by default half a worker should be consumed.

If you didn't change some memory configuration, I'm not sure why there wouldn't be at least one executor on that node. In any case you can shrink the AppMaster by decreasing spark.yarn.am.cores and spark.yarn.am.memory .

You can better debug the container packing by SSHing into the cluster and running yarn application -list or by navigating to the ResourceManager's WebUI .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM