簡體   English   中英

朴素貝葉斯與Apache Spark MLlib

[英]Naive Bayes with Apache Spark MLlib

我將朴素貝葉斯與Apache Spark MLlib一起用於文本分類,請按照以下教程進行操作: http ://avulanov.blogspot.com/2014/08/text-classification-with-apache-spark.html

 /* instantiate Spark context (not needed for running inside Spark shell */
val sc = new SparkContext("local", "test")
/* word to vector space converter, limit to 10000 words */
val htf = new HashingTF(10000)
/* load positive and negative sentences from the dataset */
/* let 1 - positive class, 0 - negative class */
/* tokenize sentences and transform them into vector space model */
val positiveData = sc.textFile("/data/rt-polaritydata/rt-polarity.pos")
  .map { text => new LabeledPoint(1, htf.transform(text.split(" ")))}
val negativeData = sc.textFile("/data/rt-polaritydata/rt-polarity.neg")
  .map { text => new LabeledPoint(0, htf.transform(text.split(" ")))}
/* split the data 60% for training, 40% for testing */
val posSplits = positiveData.randomSplit(Array(0.6, 0.4), seed = 11L)
val negSplits = negativeData.randomSplit(Array(0.6, 0.4), seed = 11L)
/* union train data with positive and negative sentences */
val training = posSplits(0).union(negSplits(0))
/* union test data with positive and negative sentences */
val test = posSplits(1).union(negSplits(1))
/* Multinomial Naive Bayesian classifier */
val model = NaiveBayes.train(training)
/* predict */
val predictionAndLabels = test.map { point =>
  val score = model.predict(point.features)
  (score, point.label)
}
/* metrics */
val metrics = new MulticlassMetrics(predictionAndLabels)
/* output F1-measure for all labels (0 and 1, negative and positive) */
metrics.labels.foreach( l => println(metrics.fMeasure(l)))

但是,經過訓練數據。 如果我想知道句子“祝你有美好的一天”是肯定的還是否定的,該怎么辦? 謝謝。

一般來說,您需要兩件事來對原始數據進行預測:

  1. 應用與訓練數據相同的轉換。 如果某些變壓器需要擬合(如IDF,歸一化,編碼),則必須在訓練后的數據上使用擬合的變壓器。 由於您的方法極其簡單,因此您需要的是以下內容:

     val testData = htf.transform("Have a nice day".split(" ")) 
  2. 使用訓練模型的predict方法:

     model.predict(testData) 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM