[英]How to get the coefficients of the best logistic regression in a spark-ml CrossValidatorModel?
I train a simple CrossValidatorModel using logistic regression and spark-ml pipelines. 我使用逻辑回归和spark-ml管道训练一个简单的CrossValidatorModel。 I can predict new data, but I'd like to go beyond the black box and do some analysis of the coefficients
我可以预测新数据,但我想超越黑盒子并对系数进行一些分析
val lr = new LogisticRegression().
setFitIntercept(true).
setMaxIter(maxIter).
setElasticNetParam(alpha).
setStandardization(true).
setFamily("binomial").
setWeightCol("weight").
setFeaturesCol("features").
setLabelCol("response")
val assembler = new VectorAssembler().
setInputCols(Array("feat1", "feat2")).
setOutputCol("features")
val modelPipeline = new Pipeline().
setStages(Array(assembler,lr))
val evaluator = new BinaryClassificationEvaluator()
.setLabelCol("response")
Then I define a grid of parameters and I train over the grid to get the best model wrt AUC 然后我定义了一个参数网格,我在网格上训练以获得最佳模型和AUC
val paramGrid = new ParamGridBuilder().
addGrid(lr.regParam, lambdas).
build()
val pipeline = new CrossValidator().
setEstimator(modelPipeline).
setEvaluator(evaluator).
setEstimatorParamMaps(paramGrid).
setNumFolds(nfolds)
val cvModel = pipeline.fit(train)
How do I get coefficients (the betas) of the best logistic regression model? 如何获得最佳逻辑回归模型的系数(beta)?
Extract best model: 提取最佳模型:
val bestModel = cvModel.bestModel match {
case pm: PipelineModel => Some(pm)
case _ => None
}
Find logistic regression model: 查找逻辑回归模型:
val lrm = bestModel
.map(_.stages.collect { case lrm: LogisticRegressionModel => lrm })
.flatMap(_.headOption)
Extract coefficients: 提取系数:
lrm.map(m => (m.intercept, m.coefficients))
Quick and dirty equivalent: 快速和脏的等价物:
val lrm: LogisticRegressionModel = cvModel
.bestModel.asInstanceOf[PipelineModel]
.stages
.last.asInstanceOf[LogisticRegressionModel]
(lrm.intercept, lrm.coefficients)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.