I am running Spark's logistic regression example from here for scala.
In the training part:
val model = new LogisticRegressionWithLBFGS().setNumClasses(10).run(training)
number of classes are set to 10. In case my data consists of 3 labels which are 5, 12 and 20; it raises an exception such as
ERROR DataValidators: Classification labels should be in {0 to 9}. Found 6 invalid labels.
I know that I can resolve it by setting classnum
larger than the largest class value.
Is it possible to run this algorithm with true number of classes on such a dataset without making an explicit transformation on label values?
If I run this with a high classnum
to make it work, does the algorithm predicts non existent classes such as 17 for example above?
I think the best thing you can do is to map
your training data and modify each entry, and by using a Map
exchange your labels
for 0.0, 1.0, 2.0, ..., n - 1
, where n = number of classes
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
import org.apache.spark.mllib.linalg.Vectors
val rdd = sc.parallelize(List(
LabeledPoint(5.0, Vectors.dense(1,2)),
LabeledPoint(12.0, Vectors.dense(1,3)),
LabeledPoint(20.0, Vectors.dense(-1,4))))
val map = Map(5 -> 0.0, 12.0 -> 1.0, 20.0 -> 2.0)
val trainingData = rdd.map{
case LabeledPoint(category, features) => LabeledPoint(map(category), features)
}
val model = new LogisticRegressionWithLBFGS().setNumClasses(3).run(trainingData)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.