Measure spark mllib classification algorithms prediction time

Question

I'm trying to measure the MLlib classification algorithms training and prediction time.

I'm running my code against 11 000 000 of records now, and the prediction time is the same as for only 1000 records (~20ms). Does the transform method work in some lazy mode?

Code I used:

BenchmarkUtil.startTime()
val trainModel = pipeline.fit(trainingData)
val trainTime = BenchmarkUtil.getProcessingTime()
println(className + " Train time [ms]: " + trainTime)

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

Sample output for 11 000 000 records - split 80% training data, 20% test data:

RandomForrestClassifierAlgorithm$ Train time [ms]: 2547637
RandomForrestClassifierAlgorithm$ Prediction time [ms]: 20

Answer 1

It turned out that I have to do action on transformed data in order to do the transformation.

When I collect the transformed data it works as fine. Code after change:

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
predictions.collect()
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

Measure spark mllib classification algorithms prediction time

Question

1 answers

solution1
0 2018-11-01 21:05:46

Measure spark mllib classification algorithms prediction time

Question

1 answers

solution1 0 2018-11-01 21:05:46

solution1
0 2018-11-01 21:05:46