Spark 應用程序的動態資源分配不起作用

Question

我是 Spark 的新手，並試圖弄清楚動態資源分配是如何工作的。 我有 Spark 結構化流應用程序，它試圖一次從 Kafka 讀取數百萬條記錄並處理它們。 我的應用程序總是從 3 個執行程序開始，並且從不增加執行程序的數量。

完成處理需要5-10分鍾。 我認為它會增加執行者的數量（最多 10 個）並嘗試更快地完成處理，這並沒有發生。我在這里錯過了什么？ 這應該如何工作？

我在 Ambari for Spark 中設置了以下屬性

spark.dynamicAllocation.enabled = true
spark.dynamicAllocation.initialExecutors = 3
spark.dynamicAllocation.maxExecutors = 10
spark.dynamicAllocation.minExecutors = 3
spark.shuffle.service.enabled = true

下面是我的提交命令的樣子

/usr/hdp/3.0.1.0-187/spark2/bin/spark-submit --class com.sb.spark.sparkTest.sparkTest --master yarn --deploy-mode cluster --queue default sparkTest-assembly-0.1.jar

火花代碼

//read stream
val dsrReadStream = spark.readStream.format("kafka")
   .option("kafka.bootstrap.servers", brokers) //kafka bokers
   .option("startingOffsets", startingOffsets) // start point to read
   .option("maxOffsetsPerTrigger", maxoffsetpertrigger) // no. of records per batch
   .option("failOnDataLoss", "true")

 /****
 Logic to validate format of loglines. Writing invalid log lines to kafka and store valid log lines in 'dsresult'

 ****/

//write stream
val dswWriteStream =dsresult.writeStream
    .outputMode(outputMode) // file write mode, default append
    .format(writeformat) // file format ,default orc
    .option("path",outPath) //hdfs file write path
    .option("checkpointLocation", checkpointdir) location
    .option("maxRecordsPerFile", 999999999) 
    .trigger(Trigger.ProcessingTime(triggerTimeInMins))

Answer 1

動態資源分配不適用於 Spark Streaming

參考這個鏈接

Answer 2

只是為了進一步澄清，

spark.streaming.dynamicAllocation.enabled=true

僅適用於 Dstreams API。 見吉拉

另外，如果你設置

spark.dynamicAllocation.enabled=true

並運行結構化流作業時，批處理動態分配算法會啟動，這可能不是非常理想。 見吉拉

Spark 應用程序的動態資源分配不起作用

問題描述

2 個解決方案

解決方案1
0 已采納 2019-05-04 18:44:27

解決方案2
0 2020-01-13 21:33:30

Spark 應用程序的動態資源分配不起作用

問題描述

2 個解決方案

解決方案1 0 已采納 2019-05-04 18:44:27

解決方案2 0 2020-01-13 21:33:30

解決方案1
0 已采納 2019-05-04 18:44:27

解決方案2
0 2020-01-13 21:33:30