简体   繁体   English

如何使用spark-cassandra-connector实现LEFT或RIGHT JOIN

[英]How implement LEFT or RIGHT JOIN using spark-cassandra-connector

I have spark streaming job. 我有火花流工作。 I am using Cassandra as datastore. 我正在使用Cassandra作为数据存储。 I have stream which is need to be joined with cassandra table. 我有需要与cassandra表一起加入的流。 I am using spark-cassandra-connector, there is great method joinWithCassandraTable which is as far as I can understand implementing inner join with cassandra table 我正在使用spark-cassandra-connector,有一个很棒的方法joinWithCassandraTable ,据我所知,使用cassandra表实现内部联接

val source: DStream[...] = ...
source.foreachRDD { rdd =>
  rdd.joinWithCassandraTable( "keyspace", "table" ).map{ ...
  }
}

So the question is how can I implement left outer join with cassandra table? 因此,问题是如何使用cassandra表实现左外部联接?

Thanks in advance 提前致谢

This is currently not supported, but there is a ticket to introduce the functionality. 当前不支持此功能,但是有介绍该功能的票证。 Please vote on it if you would like it introduced in the future. 如果您希望将来引入它,请对其进行投票。

https://datastax-oss.atlassian.net/browse/SPARKC-181 https://datastax-oss.atlassian.net/browse/SPARKC-181

A workaround is suggested in the ticket 故障单中建议了一种解决方法

As RussS mentioned such feature is not available in spark-cassandra-connector driver yet. 正如RussS提到的那样,该功能在spark-cassandra-connector驱动程序中尚不可用。 So as workaround I propose the following code snippet. 因此,作为解决方法,我提出了以下代码片段。

rdd.foreachPartition { partition =>
     CassandraConnector(rdd.context.getConf).withSessionDo { session =>
       for (
         leftSide <- partition;
         rightSide <- {
           val rs = session.execute(s"""SELECT * FROM "keyspace".table where id = "${leftSide._2}"""")
           val iterator = new PrefetchingResultSetIterator(rs, 100)
           if (iterator.isEmpty) Seq(None)
           else iterator.map(r => Some(r.getString(1)))
         }
       ) yield (leftSide, rightSide)
     }
   }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM