简体繁体中英

Why does Spark Streaming need a certain number of CPU cores to run correctly?

原文 2014-12-22 16:27:30 9 2 java/ multithreading/ apache-spark/ spark-streaming

The Spark Streaming documentation notes:

it is important to remember that Spark Streaming application needs to be allocated enough cores to process the received data, as well as, to run the receiver

and then:

If the number of cores allocated to the application is less than or equal to the number of input DStreams / receivers, then the system will receive data, but not be able to process them

This seems surprising as OSes would schedule CPU such that the application progresses, regardless of how many CPU cores are there unless it's somehow prevented from doing so. My questions are:

Does Spark do something special to prevent normal CPU scheduling?
If so, what's the rational behind it?

2 answers

I just realised that they must mean 'thread' by 'core'. If there are not enough threads, it will certainly lead to thread starvation. Inline with this, I could create a local cluster with more 'cores' than the physical cores available (eg "local[10]" on a machine with only 4 CPU cores).

No, it looks that the documentation is correct and means physical CPU cores, not threads. Starting six readers on a 4-core machine will cause the whole Spark Streaming application to stall, even with --local[10]. At the same time, the same app runs flawlessly on the machine with 8 cores.

Why does this Java code not utilize all CPU cores?

Why does Spark Standalone cluster not use all available cores?

Number of CPU cores for parallel stream Java 8

Docker Java number of available CPU cores

Why does Spark not run on Eclipse?

Spark: get number of cluster cores programmatically

What is the relationship between number of CPU cores and number of threads in an app in java?

Why does this number parser not correctly parse doubles?

How to get number of executors and number of cores in Java spark

Insertion Sort - Why does the code run correctly for certain sets of values while throws Array Index Out of Bounds Exception for other sets

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why does this Java code not utilize all CPU cores? Why does Spark Standalone cluster not use all available cores? Number of CPU cores for parallel stream Java 8 Docker Java number of available CPU cores Why does Spark not run on Eclipse? Spark: get number of cluster cores programmatically What is the relationship between number of CPU cores and number of threads in an app in java? Why does this number parser not correctly parse doubles? How to get number of executors and number of cores in Java spark Insertion Sort - Why does the code run correctly for certain sets of values while throws Array Index Out of Bounds Exception for other sets

Related Tags

Why does Spark Streaming need a certain number of CPU cores to run correctly?

Question

2 answers

solution1
0 2014-12-22 16:35:35

solution2
-2 2015-06-15 18:31:52

Why does Spark Streaming need a certain number of CPU cores to run correctly?

Question

2 answers

solution1 0 2014-12-22 16:35:35

solution2 -2 2015-06-15 18:31:52

solution1
0 2014-12-22 16:35:35

solution2
-2 2015-06-15 18:31:52