简体   繁体   中英

Measure spark mllib classification algorithms prediction time

I'm trying to measure the MLlib classification algorithms training and prediction time.

I'm running my code against 11 000 000 of records now, and the prediction time is the same as for only 1000 records (~20ms). Does the transform method work in some lazy mode?

Code I used:

BenchmarkUtil.startTime()
val trainModel = pipeline.fit(trainingData)
val trainTime = BenchmarkUtil.getProcessingTime()
println(className + " Train time [ms]: " + trainTime)

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

Sample output for 11 000 000 records - split 80% training data, 20% test data:

RandomForrestClassifierAlgorithm$ Train time [ms]: 2547637
RandomForrestClassifierAlgorithm$ Prediction time [ms]: 20

It turned out that I have to do action on transformed data in order to do the transformation.

When I collect the transformed data it works as fine. Code after change:

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
predictions.collect()
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM