简体   繁体   English

将RDD转换为DStream以在Apache Spark MlLib中应用StreamingKMeans算法

[英]Convert RDD to DStream to apply StreamingKMeans algorithm in Apache Spark MlLib

I have my scala code for anomaly detection on the KDD cup dataset. 我有用于在KDD cup数据集上进行异常检测的scala代码。 The code is at https://github.com/prashantprakash/KDDDataResearch/blob/master/Code/approach1Plus2/src/main/scala/PCA.scala 该代码位于https://github.com/prashantprakash/KDDDataResearch/blob/master/Code/approach1Plus2/src/main/scala/PCA.scala

I wanted to try a new technique by using StreamingKMeans algorithm from MlLib and update my StreamingKmeans model whenever line 288 in the above code is true "if( dist < threshold ) {"; 我想通过使用MlLib的StreamingKMeans算法尝试一种新技术,并在上述代码中的第288行为true时更新我的​​StreamingKmeans模型“ if(dist <threshold){”; ie when the test point is classified as normal, update the KMeans model with the new "normal datapoint". 即,当测试点被归类为正常时,用新的“正常数据点”更新KMeans模型。

I see that StreamingKmeans take data in the form of DStreams. 我看到StreamingKmeans以DStreams的形式获取数据。 "Please help in converting the existing RDD to Dstreams." “请帮助将现有的RDD转换为Dstream。”

I found a link http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DStream-td11145.html but it didn't help much. 我找到了一个链接http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DStream-td11145.html,但这并没有太大帮助。

Also please advice if there is a better design to solve the problem. 如果有更好的设计来解决问题,也请提出建议。

As far as I know, an RDD cannot be converted into a DStream because an RDD is a collection of data, while a DStream is a concept referring to incoming data. 据我所知,RDD不能转换为DStream,因为RDD是数据的集合,而DStream是指传入数据的概念。

If you want to use StreamingKMeans, take the data that you formed into an RDD, and instead convert it to a DStream, possibly using KafkaUtils.createDirectStream or ssc.textFileStream . 如果要使用StreamingKMeans,则可以将形成的数据放入RDD,然后将其转换为DStream,可能使用KafkaUtils.createDirectStreamssc.textFileStream

Hope this helps! 希望这可以帮助!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Apache Spark中的Scala - MLLib转换LabeledPoint中的Vector的RDD - Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark 将Spark数据帧转换为org.apache.spark.rdd.RDD [org.apache.spark.mllib.linalg.Vector] - Convert Spark Data Frame to org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] 如何将 RDD[org.apache.spark.sql.Row] 转换为 RDD[org.apache.spark.mllib.linalg.Vector] - How to convert RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] 如何将spark DataFrame转换为RDD mllib LabeledPoints? - How to convert spark DataFrame to RDD mllib LabeledPoints? 使用Scala将org.apache.spark.mllib.linalg.Vector RDD转换为Spark中的DataFrame - Convert an org.apache.spark.mllib.linalg.Vector RDD to a DataFrame in Spark using Scala 如何在RDD“ org.apache.spark.rdd.RDD [(Long,org.apache.spark.mllib.linalg.Vector)]的每一行上应用” Sum(vi * ln(vi))” - How to apply “Sum(vi * ln(vi))” on each row of an RDD “org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.linalg.Vector)]” 在 Spark-Sraming 中 DStream 到 Rdd - DStream to Rdd in Spark-Straming 将RDD [org.apache.spark.sql.Row]转换为RDD [org.apache.spark.mllib.linalg.Vector] - Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] 将三个分离的rdd [org.apache.spark.mllib.linalg.Vector]火花化为单个rdd [Vector] - spark(scala) three separated rdd[org.apache.spark.mllib.linalg.Vector] to a single rdd[Vector] Scala Spark-将RDD与mllib结合使用 - Scala Spark - using RDD with mllib
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM