简体   繁体   English

流Kmeans Spark JAVA

[英]Streaming Kmeans Spark JAVA

Hi Basically we wanted to use KAFKA+SPARK Streaming to catch Twitter Spam on our thesis. 嗨,基本上,我们想使用KAFKA + SPARK流技术来在我们的论文中捕获Twitter垃圾邮件。 And I wanted to use streamingKmeans. 我想使用streamingKmeans。 But I have very newbie and serious question: 但是我有一个非常新手和严肃的问题:

In this spark StreamingKmeans scala example ( https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala ) there is one line of code for prediction: 在此火花StreamingKmeans标量示例( https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala )中,一行预测代码:

model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()

Why I need to pass the "LABEL" with features ? 为什么我需要传递带有功能的“ LABEL”? I mean, am I getting wrong the whole idea ? 我的意思是,我整个想法都弄错了吗? Isn't we want to predict the "label" ? 我们不是要预测“标签”吗? How am I going to predict my tweets if they are spam or not ? 如果我的推文是否为垃圾邮件,我将如何预测?

For the prediction only lp.features is used, whereas lp.label is considered as a key that is carried over. 对于预测,仅使用lp.features ,而lp.label被认为是继承的密钥。 Quoting from the docs : 文档引用:

Use the model to make predictions on the values of a DStream and carry over its keys. 使用该模型对DStream的值进行预测并保留其键。

I guess in your example you would simply want to replace predictOnValues by predictOn 我想在您的示例中,您只想用predictOnValues替换predictOn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM