多类分类，使用 Spark 在 Scala 中更好地显示原始预测

Question

Working with the Iris dataset (LogisticRegressionWithLBFGS(), multiclass classification).使用 Iris 数据集（LogisticRegressionWithLBFGS()，多类分类）。 I pulled my data into an rdd, converted to a Dataframe, done some tidying up on it.我将数据提取到 rdd 中，转换为 Dataframe，对其进行了一些整理。 Created a labelindex on the Iris plant class/label field.在鸢尾植物类/标签字段上创建了一个标签索引。 Created a feature vector of the other fields.创建了其他字段的特征向量。 Took these two fields of a dataframe and converted into a labelpoint rdd instance, where I can feed the data into LogisticRegressionWithLBFGS().获取 dataframe 的这两个字段并转换为 labelpoint rdd 实例，我可以将数据输入 LogisticRegressionWithLBFGS()。

Here is some predictor code:这是一些预测器代码：

val model = new LogisticRegressionWithLBFGS()
  .setNumClasses(10)
  .setIntercept(true)
  .setValidateData(true)
  .run(training)

Scores and labels:分数和标签：

val scoreAndLabels_ofTrain = training.map {
  point =>
    val score = model.predict(point.features)
    (score, point.label)
}

I wanted to see the predictions我想看看预测

scoreAndLabels_ofTrain.take(200).foreach(println)

The only problem is, I got this example from a book, pretty much.唯一的问题是，我几乎从书中得到了这个例子。 I was kind hoping to see a dataset, that shows the feature columns, what the predicted number was, what probability score it gave, etc I'd imagine I'd need to do a conversion of the labelindex, if i wanted to see the string data they represent.我很希望看到一个数据集，它显示了特征列、预测的数字是什么、它给出的概率分数等我想如果我想看的话，我需要对 labelindex 进行转换它们代表的字符串数据。

How do I get better looking, tabular data as close as possible to the original dataset, with predictions against them?如何获得更好看的表格数据，尽可能接近原始数据集，并针对它们进行预测？ I think i'm missing a trick here somewhere.我想我在这里的某个地方错过了一个技巧。

The output to above looks like:上面的 output 看起来像：

(2.0,2.0)
(2.0,2.0)
(2.0,2.0)
(2.0,2.0)
(2.0,2.0)
...

What does this even mean?这甚至意味着什么？ Not sure how to read/interpret the data For the first line,is it saying, it predicted "2.0", and the actual label was "2.0"?不知道如何读取/解释数据对于第一行，是不是说它预测“2.0”，而实际的 label 是“2.0”？ Am I understanding it right?我理解对了吗？

Answer 1

Yes, what you have is the (Label,Prediction) in form of a RDD[(Double, Double)] when you apply the map to the input dataset and make the prediction for each element.是的，当您将 map 应用于输入数据集并对每个元素进行预测时，您所拥有的是RDD[(Double, Double)]形式的 (Label,Prediction)。 But, you are using the mlib LR implementation.但是，您使用的是 mlib LR 实现。 You can use directly the Dataframe implementation.您可以直接使用 Dataframe 实现。 Take a look to the example .看看这个例子。 The fit function optimizes the model and return a LogisticRagressionModel .拟合 function 优化 model 并返回LogisticRagressionModel 。 Apply the transform method to your input Dataframe and a new column with the prediction will be added.将转换方法应用于您的输入 Dataframe 并且将添加一个带有预测的新列。

多类分类，使用 Spark 在 Scala 中更好地显示原始预测

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-08 12:18:59

多类分类，使用 Spark 在 Scala 中更好地显示原始预测

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-08 12:18:59

解决方案1
1 已采纳 2021-03-08 12:18:59