[英]Spark Logistic regression and metrics
我想进行逻辑回归100次,随机分为测试和培训。 然后,我想保存各个运行的性能指标,然后在以后使用它们来深入了解性能。
for (index <- 1 to 100) {
val splits = training_data.randomSplit(Array(0.90, 0.10), seed = index)
val training = splits(0).cache()
val test = splits(1)
logrmodel = train_LogisticRegression_model(training)
performLogisticRegressionRuns(logrmodel, test, index)
}
spark.stop()
}
def performLogisticRegressionRuns(model: LogisticRegressionModel, test: RDD[LabeledPoint], iterationcount: Int) {
private val sb = StringBuilder.newBuilder
// Compute raw scores on the test set. Once I cle
model.clearThreshold()
val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
val prediction = model.predict(features)
(prediction, label)
}
val bcmetrics = new BinaryClassificationMetrics(predictionAndLabels)
// I am showing two sample metrics, but I am collecting more including recall, area under roc, f1 score etc....
val precision = bcmetrics.precisionByThreshold()
precision.foreach { case (t, p) =>
// If threshold is 0.5 as what we want, then get the precision and append it to the string. Idea is if score is <0.5 class 0, else class 1.
if (t == 0.5) {
println(s"Threshold is: $t, Precision is: $p")
sb ++= p.toString() + "\t"
}
}
val auROC = bcmetrics.areaUnderROC
sb ++= iteration + auPRC.toString() + "\t"
我想将每次迭代的性能结果保存在单独的文件中。 我试过了,但是没有用,任何帮助都很好
val data = spark.parallelize(sb)
val filename = "logreg-metrics" + iterationcount.toString() + ".txt"
data.saveAsTextFile(filename)
}
我能够解决此问题,我做了以下工作。 我将字符串转换为列表。
val data = spark.parallelize(List(sb))
val filename = "logreg-metrics" + iterationcount.toString() + ".txt"
data.saveAsTextFile(filename)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.