简体   繁体   English

apache spark:本地[K]主URL - 工作卡住了

[英]apache spark: local[K] master URL - job gets stuck

I am using apache spark 0.8.0 to process a large data file and perform some basic .map and .reduceByKey operations on the RDD . 我正在使用apache spark 0.8.0处理大型数据文件并在RDD上执行一些基本的.map.reduceByKey操作。

Since I am using a single machine with multiple processors, I mention local[8] in the Master URL field while creating SparkContext 由于我使用的是具有多个处理器的单台机器,因此在创建SparkContext ,我在Master URL字段中提到了local[8]

val sc = new SparkContext("local[8]", "Tower-Aggs", SPARK_HOME ) 

But whenever I mention multiple processors, the job gets stuck (pauses/halts) randomly. 但每当我提到多个处理器时,作业就会随机卡住(暂停/暂停)。 There is no definite place where it gets stuck, its just random. 它没有确定的地方被卡住,它只是随机的。 Sometimes it won't happen at all. 有时它根本不会发生。 I am not sure if it continues after that but it gets stuck for a long time after which I abort the job. 我不确定它是否会在此之后继续但是它会在很长一段时间内停滞不前,之后我就会中止这项工作。

But when I just use local in place of local[8] , the job runs seamlessly without getting stuck ever. 但是当我只使用local代替local[8] ,工作无缝地运行而不会卡住。

val sc = new SparkContext("local", "Tower-Aggs", SPARK_HOME )

I am not able to understand where is the problem. 我无法理解问题出在哪里。

I am using Scala 2.9.3 and sbt to build and run the application 我正在使用Scala 2.9.3sbt来构建和运行应用程序

I'm using spark 1.0.0 and met the same problem: if a function passed to a transformation or action wait/loop indefinitely, then spark won't wake it or terminate/retry it by default, in which case you can kill the task. 我正在使用spark 1.0.0并遇到同样的问题:如果函数无限期地传递给转换或动作等待/循环,那么spark将不会唤醒它或默认终止/重试它,在这种情况下你可以杀死它任务。

However, a recent feature (speculative task) allows spark to start replicated tasks if a few tasks take much longer than average running time of their peers. 但是,最近的一项功能(推测性任务)允许spark启动复制任务,如果一些任务比同行的平均运行时间长得多。 This can be enabled and configured in the following config properties: 可以在以下配置属性中启用和配置:

  • spark.speculation false If set to "true", performs speculative execution of tasks. spark.speculation false如果设置为“true”,则执行任务的推测执行。 This means if one or more tasks are running slowly in a stage, they will be re-launched. 这意味着如果一个或多个任务在一个阶段中运行缓慢,它们将被重新启动。

  • spark.speculation.interval 100 How often Spark will check for tasks to speculate, in milliseconds. spark.speculation.interval 100 Spark将检查要推测的任务的频率,以毫秒为单位。

  • spark.speculation.quantile 0.75 Percentage of tasks which must be complete before speculation is enabled for a particular stage. spark.speculation.quantile 0.75在为特定阶段启用推测之前必须完成的任务百分比。

  • spark.speculation.multiplier 1.5 How many times slower a task is than the median to be considered for speculation. spark.speculation.multiplier 1.5任务的速度比推测考虑的中位数慢多少倍。

(source: http://spark.apache.org/docs/latest/configuration.html ) (来源: http//spark.apache.org/docs/latest/configuration.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM