简体   繁体   中英

Spark fails with too many open files on HDInsight YARN cluster

I am running into the same issue as in this thread with my Scala Spark Streaming application: Why does Spark job fail with "too many open files"?

But given that I am using Azure HDInsights to deploy my YARN cluster, and I don't think I can log into that machine and update the ulimit in all machines.

Is there any other way to solve this problem? I cannot reduce the number of reducers by too much either, or my job will become much slower.

You can ssh into all nodes from the head node (ambari ui show fqdn of all nodes).

ssh sshuser@nameofthecluster.azurehdinsight.net

You can the write a custom action that alters the settings on the necessary nodes if you want to automate this action.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM