简体   繁体   English

用RDD [String]加入CassandraTableScanRDD [CassandraRow]

[英]join CassandraTableScanRDD[CassandraRow] with RDD[String]

I am writing a program where I have a RDD[String] and a CassandraTableScanRDD and i want to do a left join between them. 我正在编写一个程序,其中有一个RDD [String]和一个CassandraTableScanRDD,我想在它们之间进行左连接。

Is this possible? 这可能吗? From what I saw online that joins were only happing between CassandraTableScanRDD. 从网上看到的结果来看,联接只是在CassandraTableScanRDD之间进行。

join functions are available for PairRDD objects (see here ). join函数可用于PairRDD对象(请参见此处 )。

A PairRDD object is an RDD of key-value pairs, for example: RDD[(Int, String)] PairRDD对象是键值对的RDD,例如: RDD[(Int, String)]

Typically you create a PairRDD object from a regular RDD using the keyBy function, which allows you to specify which key to use. 通常,您创建一个PairRDD从一个普通对象RDD使用keyBy功能,它允许您指定要使用的关键。 Then when you run join , it joins elements whose keys are equal. 然后,当您运行join ,它将联接键相等的元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM