简体   繁体   English

测量Spark mllib分类算法的预测时间

[英]Measure spark mllib classification algorithms prediction time

I'm trying to measure the MLlib classification algorithms training and prediction time. 我正在尝试评估MLlib分类算法的训练和预测时间。

I'm running my code against 11 000 000 of records now, and the prediction time is the same as for only 1000 records (~20ms). 我现在对11000万条记录运行我的代码,并且预测时间与仅1000条记录(〜20ms)相同。 Does the transform method work in some lazy mode? 转换方法是否可以在某些惰性模式下工作?

Code I used: 我使用的代码:

BenchmarkUtil.startTime()
val trainModel = pipeline.fit(trainingData)
val trainTime = BenchmarkUtil.getProcessingTime()
println(className + " Train time [ms]: " + trainTime)

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

Sample output for 11 000 000 records - split 80% training data, 20% test data: 11000万条记录的样本输出-拆分了80%的培训数据,20%的测试数据:

RandomForrestClassifierAlgorithm$ Train time [ms]: 2547637
RandomForrestClassifierAlgorithm$ Prediction time [ms]: 20

It turned out that I have to do action on transformed data in order to do the transformation. 原来,我必须对转换后的数据执行操作才能进行转换。

When I collect the transformed data it works as fine. 当我收集转换后的数据时,它可以正常工作。 Code after change: 更改后的代码:

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
predictions.collect()
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM