朴素贝叶斯与Apache Spark MLlib

Question

I'm using Naive Bayes with Apache Spark MLlib for Text classification follow tutorial: http://avulanov.blogspot.com/2014/08/text-classification-with-apache-spark.html 我将朴素贝叶斯与Apache Spark MLlib一起用于文本分类，请按照以下教程进行操作： http ://avulanov.blogspot.com/2014/08/text-classification-with-apache-spark.html

 /* instantiate Spark context (not needed for running inside Spark shell */
val sc = new SparkContext("local", "test")
/* word to vector space converter, limit to 10000 words */
val htf = new HashingTF(10000)
/* load positive and negative sentences from the dataset */
/* let 1 - positive class, 0 - negative class */
/* tokenize sentences and transform them into vector space model */
val positiveData = sc.textFile("/data/rt-polaritydata/rt-polarity.pos")
  .map { text => new LabeledPoint(1, htf.transform(text.split(" ")))}
val negativeData = sc.textFile("/data/rt-polaritydata/rt-polarity.neg")
  .map { text => new LabeledPoint(0, htf.transform(text.split(" ")))}
/* split the data 60% for training, 40% for testing */
val posSplits = positiveData.randomSplit(Array(0.6, 0.4), seed = 11L)
val negSplits = negativeData.randomSplit(Array(0.6, 0.4), seed = 11L)
/* union train data with positive and negative sentences */
val training = posSplits(0).union(negSplits(0))
/* union test data with positive and negative sentences */
val test = posSplits(1).union(negSplits(1))
/* Multinomial Naive Bayesian classifier */
val model = NaiveBayes.train(training)
/* predict */
val predictionAndLabels = test.map { point =>
  val score = model.predict(point.features)
  (score, point.label)
}
/* metrics */
val metrics = new MulticlassMetrics(predictionAndLabels)
/* output F1-measure for all labels (0 and 1, negative and positive) */
metrics.labels.foreach( l => println(metrics.fMeasure(l)))

But, after training data. 但是，经过训练数据。 What should I do if I want to know sentence "Have a nice day" is positive or negative? 如果我想知道句子“祝你有美好的一天”是肯定的还是否定的，该怎么办？ Thank you. 谢谢。

Answer 1

Generally speaking you need two things to make prediction on a raw data: 一般来说，您需要两件事来对原始数据进行预测：

Apply the same transformations you've used for training data. 应用与训练数据相同的转换。 If some transformer require fitting (like IDF, normalization, encoding) you have to use one fitted on a trained data. 如果某些变压器需要拟合（如IDF，归一化，编码），则必须在训练后的数据上使用拟合的变压器。 Since your approach is extremely simplistic all you need here is something like this: 由于您的方法极其简单，因此您需要的是以下内容：
```
 val testData = htf.transform("Have a nice day".split(" ")) 
```
Use predict method of the trained model: 使用训练模型的predict方法：
```
 model.predict(testData) 
```

朴素贝叶斯与Apache Spark MLlib

问题描述

1 个解决方案

解决方案1
3 2015-10-13 11:29:34

朴素贝叶斯与Apache Spark MLlib

问题描述

1 个解决方案

解决方案1 3 2015-10-13 11:29:34

解决方案1
3 2015-10-13 11:29:34