[英]Streaming Kmeans Spark JAVA
Hi Basically we wanted to use KAFKA+SPARK Streaming to catch Twitter Spam on our thesis. 嗨,基本上,我们想使用KAFKA + SPARK流技术来在我们的论文中捕获Twitter垃圾邮件。 And I wanted to use streamingKmeans.
我想使用streamingKmeans。 But I have very newbie and serious question:
但是我有一个非常新手和严肃的问题:
In this spark StreamingKmeans scala example ( https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala ) there is one line of code for prediction: 在此火花StreamingKmeans标量示例( https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala )中,一行预测代码:
model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
Why I need to pass the "LABEL" with features ? 为什么我需要传递带有功能的“ LABEL”? I mean, am I getting wrong the whole idea ?
我的意思是,我整个想法都弄错了吗? Isn't we want to predict the "label" ?
我们不是要预测“标签”吗? How am I going to predict my tweets if they are spam or not ?
如果我的推文是否为垃圾邮件,我将如何预测?
For the prediction only lp.features
is used, whereas lp.label
is considered as a key that is carried over. 对于预测,仅使用
lp.features
,而lp.label
被认为是继承的密钥。 Quoting from the docs : 从文档引用:
Use the model to make predictions on the values of a DStream and carry over its keys.
使用该模型对DStream的值进行预测并保留其键。
I guess in your example you would simply want to replace predictOnValues
by predictOn
我想在您的示例中,您只想用
predictOnValues
替换predictOn
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.