简体   繁体   English

Spark在HDInsight YARN群集上打开文件太多而失败

[英]Spark fails with too many open files on HDInsight YARN cluster

I am running into the same issue as in this thread with my Scala Spark Streaming application: Why does Spark job fail with "too many open files"? 我使用我的Scala Spark Streaming应用程序遇到与此线程相同的问题: 为什么Spark作业因“太多打开的文件”而失败?

But given that I am using Azure HDInsights to deploy my YARN cluster, and I don't think I can log into that machine and update the ulimit in all machines. 但鉴于我使用Azure HDInsights来部署我的YARN集群,我认为我不能登录到该计算机并更新所有计算机中的ulimit。

Is there any other way to solve this problem? 有没有其他方法可以解决这个问题? I cannot reduce the number of reducers by too much either, or my job will become much slower. 我也不能减少减速器的数量,否则我的工作会慢得多。

You can ssh into all nodes from the head node (ambari ui show fqdn of all nodes). 您可以从头节点ssh到所有节点(ambari ui show fqdn of all nodes)。

ssh sshuser@nameofthecluster.azurehdinsight.net

You can the write a custom action that alters the settings on the necessary nodes if you want to automate this action. 如果要自动执行此操作,可以编写自定义操作来更改必要节点上的设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM