[英]iterative code with long lineage RDD causes stackoverflow error in Apache Spark
我是 Apache Spark 的初學者。 我目前正在開發一個機器學習程序,該程序需要迭代更新 RDD,然后從執行程序收集近 10KB 數據到驅動程序。 不幸的是,當它運行超過 600 次迭代時,我收到了 StackOverFlow 錯誤! 以下是我的代碼。 當迭代次數超過 400 時,collectAsMap 函數發生了 stackoverflow 錯誤! 其中 indexedDevF 和 indexedData 是 indexedRDD(由 AMPLab 作為提供的庫開發https://github.com/amplab/spark-indexedrdd )
breakable{
while(bLow > bHigh + 2*tolerance){
indexedDevF = indexedDevF.innerJoin(indexedData){(id, a, b) => (b, a)}.mapValues( x => ( x._2 + alphaHighDiff * broad_y.value(iHigh) * kernel(x._1, dataiHigh) + alphaLowDiff * broad_y.value(iLow) * kernel(x._1, dataiLow) ) )
if (iteration % 50 == 0 ) {
indexedDevF.checkpoint()
}
indexedDevF.persist() // essential to get correct answer
val devFMap = indexedDevF.collectAsMap() //0.5s every time according to local:4040! here will stackoverflow
var min_value = Double.PositiveInfinity
var max_value = -min_value
var min_i = -1
var max_i = -1
i = 0
while( i < m ){
if(((y(i) > 0) && (alpha(i) < cEpsilon)) || ((y(i) < 0) && (alpha(i) > epsilon))){
if( devFMap(i) <= min_value){
min_value = devFMap(i)
min_i = i
}
}
if(((y(i) > 0) && (alpha(i) > epsilon)) || ((y(i) < 0) && (alpha(i) < cEpsilon))){
if( devFMap(i) >= max_value ){
max_value = devFMap(i)
max_i = i
}
}
i = i+1
}
iHigh = min_i
iLow = max_i
bHigh = devFMap(iHigh)
bLow = devFMap(iLow)
dataiHigh = indexedData.get(iHigh.toLong).get
dataiLow = indexedData.get(iLow.toLong).get
eta = 2 - 2 * kernel(dataiHigh, dataiLow)
alphaHighOld = alpha(iHigh)
alphaLowOld = alpha(iLow)
var alphaDiff = alphaLowOld - alphaHighOld
var lowLabel = y(iLow)
var sign = y(iHigh) * lowLabel
var alphaLowLowerBound = 0D
var alphaLowUpperBound = 0D
if (sign < 0){
if (alphaDiff < 0){
alphaLowLowerBound = 0;
alphaLowUpperBound = cost + alphaDiff;
}
else{
alphaLowLowerBound = alphaDiff;
alphaLowUpperBound = cost;
}
}
else{
var alphaSum = alphaLowOld + alphaHighOld;
if (alphaSum < cost){
alphaLowUpperBound = alphaSum;
alphaLowLowerBound = 0;
}
else{
alphaLowLowerBound = alphaSum - cost;
alphaLowUpperBound = cost;
}
}
if (eta > 0){
alphaLowNew = alphaLowOld + lowLabel*(bHigh - bLow)/eta;
if (alphaLowNew < alphaLowLowerBound)
alphaLowNew = alphaLowLowerBound;
else if (alphaLowNew > alphaLowUpperBound)
alphaLowNew = alphaLowUpperBound;
}
else{
var slope = lowLabel * (bHigh - bLow);
var delta = slope * (alphaLowUpperBound - alphaLowLowerBound);
if (delta > 0){
if (slope > 0)
alphaLowNew = alphaLowUpperBound;
else
alphaLowNew = alphaLowLowerBound;
}
else
alphaLowNew = alphaLowOld;
}
alphaLowDiff = alphaLowNew - alphaLowOld;
alphaHighDiff = -sign*(alphaLowDiff);
alpha(iLow) = alphaLowNew;
alpha(iHigh) = (alphaHighOld + alphaHighDiff);
if(iteration % 50 == 0)
print(".")
iteration = iteration + 1;
}
====================
原來的問題如下,我發現檢查點沒用,程序會以stackoverflow errer結束!! 我寫了一個測試簡單的代碼來描述我的問題。 還好有好心人幫我解決問題,你可以在下面找到答案! 但是,即使檢查點確實有效,我的程序仍然會遇到計算器溢出錯誤:(
for(i <- 1 to 1000){
a = a.map(x => x+1).persist
var b = a.collect()
if(i%100 == 0){
a.checkpoint()
}
print(".")
}
查看RDD.checkpoint
文檔,它說:
必須在此 RDD 上執行任何作業之前調用此函數
事實上,如果你稍微改變你的代碼,在收集a
之前完成檢查點 - 它可以在沒有StackOverflowError
:
for(i <- 1 to 1000){
a = a.map(x => x+1).persist
if(i%100 == 0){
a.checkpoint()
}
var b = a.collect()
print(".")
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.