Spark fails with too many open files on HDInsight YARN cluster

Question

I am running into the same issue as in this thread with my Scala Spark Streaming application: Why does Spark job fail with "too many open files"?

But given that I am using Azure HDInsights to deploy my YARN cluster, and I don't think I can log into that machine and update the ulimit in all machines.

Is there any other way to solve this problem? I cannot reduce the number of reducers by too much either, or my job will become much slower.

Answer 1

You can ssh into all nodes from the head node (ambari ui show fqdn of all nodes).

ssh sshuser@nameofthecluster.azurehdinsight.net

You can the write a custom action that alters the settings on the necessary nodes if you want to automate this action.

Spark fails with too many open files on HDInsight YARN cluster

Question

1 answers

solution1
0 2018-05-13 06:40:16

Spark fails with too many open files on HDInsight YARN cluster

Question

1 answers

solution1 0 2018-05-13 06:40:16

solution1
0 2018-05-13 06:40:16