I have a cassandra table person_master (personId: int, customerId: Int, firstName: String, lastName: String, mrids: Set) primaryKey (personId and customerID)
Suppose I have one input RDD of structure [personId, customerId, firstName, lastname, messageType: String, source: String, sourceType: String]
suppose value of RDD:[1001,119,None,None,{abc.xyz} and cassandra row has value [1001,119,Vikash,Singh,{aaa.bbb}]
I want on fetch cassandra row based on RDD value and update the mrids column of cassandra table and using all other column from cassandra row.
eg in this I want final RDD value as [1001,119,Vikash,Singh,{aaa.bbb,abc.xyz}] which I will update to cassandra later.
Can anybody give me the solution to do this in Spark using cassandra Connector.
Assuming sc is sparkContext like,
val sparkConf = new SparkConf().setMaster(SPARK_MASTER)
.setAppName(SPARK_SCALA_APP_NAME)
.setJars(SPARK_SCALA_JAR)
sparkConf.set("spark.cassandra.connection.host", value)
sparkConf.set("spark.cassandra.auth.username", value)
sparkConf.set("spark.cassandra.auth.password", value)
val sc = new SparkContext(sparkConf)
You can use or ignore where clause (where can be used only if its partition key)
val selectedRow = sc.cassandraTable("keyspace", "tableName")
.select("key", "column2", "column3")
.where("key IN ?", keys)
.as((key: String, column2: String, column3: Integer)
=>(key, column2, column3))
Do filtering and modification on your rdd Then save it like,
selectedRow.saveToCassandra("keyspace",
"tableName",
SomeColumns("key", "column2", "column3"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.