[英]Spark Streaming Job OOM when I increase resources
I've got a 4 node Spark Standalone cluster with a spark streaming job running on it. 我有一个4节点的Spark Standalone集群,上面运行着Spark Streaming作业。
When I submit the job with 7 cores per executor everything runs smoothly: 当我以每个执行者提交具有7个核心的作业时,一切运行顺利:
spark-submit --class com.test.StreamingJob --supervise --master spark://{SPARK_MASTER_IP}:7077 --executor-memory 30G --executor-cores 7 --total-executor-cores 28 /path/to/jar/spark-job.jar
When I increase to 24 cores per executor none of the batches get processed and I see java.lang.OutOfMemoryError: unable to create new native thread in the executor logs. 当我将每个执行器的内核数增加到24个时,没有批处理得到处理,并且我看到java.lang.OutOfMemoryError:无法在执行器日志中创建新的本机线程。 The executors then keep failing:
执行者然后继续失败:
spark-submit --class com.test.StreamingJob --supervise --master spark://{SPARK_MASTER_IP}:7077 --executor-memory 30G --executor-cores 24 --total-executor-cores 96 /path/to/jar/spark-job.jar
Error: 错误:
17/01/12 16:01:00 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Shutdown-checker,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:534)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146)
at io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69)
at com.datastax.driver.core.NettyOptions.onClusterClose(NettyOptions.java:190)
at com.datastax.driver.core.Connection$Factory.shutdown(Connection.java:844)
at com.datastax.driver.core.Cluster$Manager$ClusterCloseFuture$1.run(Cluster.java:2488)
I found this question and tried upping the ulimits substantially but it had no effect. 我发现了这个问题,并尝试大幅提高上限,但没有效果。
Each box has 32 cores and 61.8 GB memory. 每个盒子都有32个核心和61.8 GB内存。 The streaming job is written in java and running on Spark 2.0.0 connecting to Cassandra 3.7.0 with the spark-cassandra-connector-java_2.10 1.5.0-M2.
流作业是用Java编写的,并在Spark 2.0.0上运行,并通过spark-cassandra-connector-java_2.10 1.5.0-M2连接到Cassandra 3.7.0。
The data is a very small trickle of less than 100 events per second each of which are less than 200B. 数据是非常小的滴流,每秒少于100个事件,每个事件小于200B。
Sounds like you are running Out of Memory ;). 听起来您内存不足;)。
For a little more detail, the number of cores in use by Spark is directly tied to the amount of information being worked on in parallel. 更详细一点,Spark使用的内核数量与并行处理的信息量直接相关。 You can basically think about each Core as working on a full Spark Partition's data and can potentially require the full thing to reside in memory.
您基本上可以将每个Core视为正在处理完整的Spark Partition数据,并且可能需要将完整的内容驻留在内存中。
7 Cores per executor means 7 Spark Partitions are being worked on simultaneously. 每个执行程序7个内核意味着7个Spark分区正在同时工作。 Bumping this number up to 24 means roughly 4 times as much ram will be in use.
将该数字最多提高到24,意味着将使用大约4倍的内存。 This could easily cause an OOM in various places.
这很容易在各个地方引起OOM。
There are a few ways to deal with this. 有几种方法可以解决此问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.