I have Cassandra Table and I have selected some columns to do Association rules on them. I have created case class for each column to save them in it. I have the column data of type
com.datastax.spark.connector.rdd.CassandraRDD[SuperStoreSalesRG]
where SuperStoreSalesRG is the case class for single column I want to convert it to
RDD[Array[String]]
How to Do that ?!
many thanks..
this is what I've tried so far
val test_spark_rdd = sc.cassandraTable("demo1", "orders4")
case class SuperStoreSalesPC (ProductCategory: String)
case class SuperStoreSalesCS (CustomerSegment: String)
case class SuperStoreSalesRG (Region: String)
val resultPC = test_spark_rdd.select("productcategory").as(SuperStoreSalesPC)
val resultCS = test_spark_rdd.select("customersegment").as(SuperStoreSalesCS)
val resultRG = test_spark_rdd.select("region").as(SuperStoreSalesRG)
I want to convert each of vals: resultPC, resultCS, resultRG in separate RDD[Array[String]] where these vals are the columns
After you separate the three columns "productcategory", "customersegment", "region"
into three datasets resultPC, resultCS, resultRG
, you can do the following to convert each of the datasets
to RDD[Array[String]]
First step would be to use inbuilt collect_list
function
import org.apache.spark.sql.functions._
val arrayedResultPC = resultPC.withColumn("productcategory", collect_list("productcategory"))
which would create datasets
with following schema
root
|-- productcategory: array (nullable = true)
| |-- element: string (containsNull = true)
You can do the same for other two datasets
Final step would be to convert the collected datasets
to RDD[Array[String]]
val arrayedRdd = arrayedResultPC.rdd.map(_.toSeq(0).asInstanceOf[mutable.WrappedArray[String]])
I hope the answer is helpful
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.