简体   繁体   English

Spark中Logistic回归模型的areaUnderROC的计算

[英]Calculation of areaUnderROC of logistic regression model in Spark

I have a logistic regression model in Spark. 我在Spark中有一个逻辑回归模型。
I want to extract the probability for label=1 from the output vector and calculate the areaUnderROC. 我想从输出向量中提取label = 1的概率,并计算areaUnderROC。

val assembler = new VectorAssembler()
.setInputCols(Array("A","B","C","D","E"))--for example
.setOutputCol("features")

val data = assembler.transform(logregdata)

val Array(training,test) = data.randomSplit(Array(0.7,0.3),seed=12345)
val training1 = training.select("label", "features")
val test1 = test.select("label", "features")

val lr = new LogisticRegression()
val model = lr.fit(training1)
val results = model.transform(test1)
results.show()

label|            features|       rawPrediction|    probability|  prediction|
+-----+--------------------+--------------------+--------------------+----------

  0.0|(54,[13,31,34,35,...|[2.44227333947447...|[0.91999457581425...|       0.0|

import org.apache.spark.mllib.evaluation.MulticlassMetrics

val predictionAndLabels =results.select($"probability",$"label").as[(Double,Double)].rdd
val metrics = new MulticlassMetrics(predictionAndLabels)
val auROC= metrics.areaUnderROC()

The probability looks like that: [0.9199945758142595,0.0800054241857405] 概率看起来像这样:[0.9199945758142595,0.0800054241857405]
How can I extract the probability for label=1 from the vector and calculate the AUC? 如何从向量中提取label = 1的概率并计算AUC?

You could get the value from the underlying RDD . 您可以从基础的RDD获取值。 This would return a tuple with your original label and the predicted value for P(label=1) : 这将返回具有原始标签和P(label=1)的预测值的tuple

val predictions = results.map(row => (row.getAs[Double]("label"), row.getAs[Vector]("probability")(0)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM