[英]join CassandraTableScanRDD[CassandraRow] with RDD[String]
I am writing a program where I have a RDD[String] and a CassandraTableScanRDD and i want to do a left join between them. 我正在编写一个程序,其中有一个RDD [String]和一个CassandraTableScanRDD,我想在它们之间进行左连接。
Is this possible? 这可能吗? From what I saw online that joins were only happing between CassandraTableScanRDD.
从网上看到的结果来看,联接只是在CassandraTableScanRDD之间进行。
join
functions are available for PairRDD
objects (see here ). join
函数可用于PairRDD
对象(请参见此处 )。
A PairRDD
object is an RDD of key-value pairs, for example: RDD[(Int, String)]
PairRDD
对象是键值对的RDD,例如: RDD[(Int, String)]
Typically you create a PairRDD
object from a regular RDD
using the keyBy
function, which allows you to specify which key to use. 通常,您创建一个
PairRDD
从一个普通对象RDD
使用keyBy
功能,它允许您指定要使用的关键。 Then when you run join
, it joins elements whose keys are equal. 然后,当您运行
join
,它将联接键相等的元素。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.