繁体   English   中英

如何使用Java获取Spark mllib中逻辑回归的p值

[英]how to get p value for logistic regression in spark mllib using java

如何使用Java在Spark MLlib中获取逻辑回归的p值。 如何找到分类等级的概率。 以下是我尝试过的代码:

SparkConf sparkConf = new SparkConf().setAppName("GRP").setMaster("local[*]");
SparkContext ctx = new SparkContext(sparkConf);

LabeledPoint pos = new LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0));
String path = "dataSetnew.txt";

JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(ctx, path).toJavaRDD();
JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[] {0.6, 0.4}, 11L);
JavaRDD<LabeledPoint> training = splits[0].cache();
JavaRDD<LabeledPoint> test = splits[1];   

final org.apache.spark.mllib.classification.LogisticRegressionModel model = 
    new LogisticRegressionWithLBFGS()
        .setNumClasses(2)
        .setIntercept(true)
        .run(training.rdd());    

JavaRDD<Tuple2<Object, Object>> predictionAndLabels = test.map(
    new org.apache.spark.api.java.function.Function<LabeledPoint, Tuple2<Object, Object>>() {
        public Tuple2<Object, Object> call(LabeledPoint p) {
          Double prediction = model.predict(p.features());
         // System.out.println("prediction :"+prediction);
          return new Tuple2<Object, Object>(prediction, p.label());
        }
      }
    );   

Vector denseVecnew = Vectors.dense(112,110,110,0,0,0,0,0,0,0,0);
Double prediction = model.predict(denseVecnew);
Vector weightVector = model.weights();          
System.out.println("weights : "+weightVector);           
System.out.println("intercept : "+model.intercept());       
System.out.println("forecast”+ prediction);    
ctx.stop();

对于二进制分类,可以使用LogisticRegressionModel.clearThreshold方法。 调用后, predict将返回原始分数

在此处输入图片说明

而不是标签。 这些值在[0,1]范围内,可以解释为概率。

请参阅clearThreshold文档

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM