简体繁体中英

Convert RDD to DStream to apply StreamingKMeans algorithm in Apache Spark MlLib

原文 2016-06-29 05:11:27 6 1 scala/ apache-spark/ k-means/ apache-spark-mllib

I have my scala code for anomaly detection on the KDD cup dataset. The code is at https://github.com/prashantprakash/KDDDataResearch/blob/master/Code/approach1Plus2/src/main/scala/PCA.scala

I wanted to try a new technique by using StreamingKMeans algorithm from MlLib and update my StreamingKmeans model whenever line 288 in the above code is true "if( dist < threshold ) {"; ie when the test point is classified as normal, update the KMeans model with the new "normal datapoint".

I see that StreamingKmeans take data in the form of DStreams. "Please help in converting the existing RDD to Dstreams."

I found a link http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DStream-td11145.html but it didn't help much.

Also please advice if there is a better design to solve the problem.

1 answers

As far as I know, an RDD cannot be converted into a DStream because an RDD is a collection of data, while a DStream is a concept referring to incoming data.

If you want to use StreamingKMeans, take the data that you formed into an RDD, and instead convert it to a DStream, possibly using KafkaUtils.createDirectStream or ssc.textFileStream .

Hope this helps!

Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark

Convert Spark Data Frame to org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]

How to convert RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]

How to convert spark DataFrame to RDD mllib LabeledPoints?

Convert an org.apache.spark.mllib.linalg.Vector RDD to a DataFrame in Spark using Scala

How to apply “Sum(vi * ln(vi))” on each row of an RDD “org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.linalg.Vector)]”

DStream to Rdd in Spark-Straming

Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]

spark(scala) three separated rdd[org.apache.spark.mllib.linalg.Vector] to a single rdd[Vector]

Scala Spark - using RDD with mllib

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark Convert Spark Data Frame to org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] How to convert RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] How to convert spark DataFrame to RDD mllib LabeledPoints? Convert an org.apache.spark.mllib.linalg.Vector RDD to a DataFrame in Spark using Scala How to apply “Sum(vi * ln(vi))” on each row of an RDD “org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.linalg.Vector)]” DStream to Rdd in Spark-Straming Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] spark(scala) three separated rdd[org.apache.spark.mllib.linalg.Vector] to a single rdd[Vector] Scala Spark - using RDD with mllib

Related Tags

Convert RDD to DStream to apply StreamingKMeans algorithm in Apache Spark MlLib

Question

1 answers

solution1 1 2016-08-10 15:59:03

solution1
1 2016-08-10 15:59:03