简体   繁体   中英

Convert CassandraRDD to RDD[Array[String]]

I have Cassandra Table and I have selected some columns to do Association rules on them. I have created case class for each column to save them in it. I have the column data of type

com.datastax.spark.connector.rdd.CassandraRDD[SuperStoreSalesRG]

where SuperStoreSalesRG is the case class for single column I want to convert it to

RDD[Array[String]]

How to Do that ?!

many thanks..

this is what I've tried so far

val test_spark_rdd = sc.cassandraTable("demo1", "orders4") 

case class SuperStoreSalesPC (ProductCategory: String) 
case class SuperStoreSalesCS (CustomerSegment: String) 
case class SuperStoreSalesRG (Region: String) 

val resultPC = test_spark_rdd.select("productcategory").as(SuperStoreSalesP‌​C) 
val resultCS = test_spark_rdd.select("customersegment").as(SuperStoreSalesC‌​S) 
val resultRG = test_spark_rdd.select("region").as(SuperStoreSalesRG)

I want to convert each of vals: resultPC, resultCS, resultRG in separate RDD[Array[String]] where these vals are the columns

After you separate the three columns "productcategory", "customersegment", "region" into three datasets resultPC, resultCS, resultRG , you can do the following to convert each of the datasets to RDD[Array[String]]

First step would be to use inbuilt collect_list function

import org.apache.spark.sql.functions._
val arrayedResultPC = resultPC.withColumn("productcategory", collect_list("productcategory"))

which would create datasets with following schema

root
 |-- productcategory: array (nullable = true)
 |    |-- element: string (containsNull = true)

You can do the same for other two datasets

Final step would be to convert the collected datasets to RDD[Array[String]]

val arrayedRdd = arrayedResultPC.rdd.map(_.toSeq(0).asInstanceOf[mutable.WrappedArray[String]])

I hope the answer is helpful

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM