Spark Cassandra迭代查询

Question

I am applying the following through the Spark Cassandra Connector: 我正在通过Spark Cassandra Connector应用以下内容：

val links = sc.textFile("linksIDs.txt")
links.map( link_id => 
{ 
val link_speed_records = sc.cassandraTable[Double]("freeway","records").select("speed").where("link_id=?",link_id)
average = link_speed_records.mean().toDouble
})

I would like to ask if there is way to apply the above sequence of queries more efficiently given that the only parameter I always change is the 'link_id'. 考虑到我经常更改的唯一参数是“ link_id”，我想问一下是否有办法更有效地应用上述查询序列。

The 'link_id' value is the only Partition Key of my Cassandra 'records' table. “ link_id”值是我的Cassandra“记录”表中的唯一分区键。 I am using Cassandra v.2.0.13, Spark v.1.2.1 and Spark-Cassandra Connector v.1.2.1 我正在使用Cassandra v.2.0.13，Spark v.1.2.1和Spark-Cassandra Connector v.1.2.1

I was thinking if it is possible to open a Cassandra Session in order to apply those queries and still get the 'link_speed_records' as a SparkRDD. 我在想是否可以打开Cassandra会话以应用这些查询，并仍然以SparkRDD的形式获取“ link_speed_records”。

Answer 1

Use the joinWithCassandra Method to use an RDD of keys to pull data out of a Cassandra Table. 使用joinWithCassandra方法使用键的RDD将数据从Cassandra表中拉出。 The method given in the question will be extremely expensive comparatively and also not function well as a parallelizable request. 问题中给出的方法相对而言将极其昂贵，并且不能很好地用作可并行化的请求。

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md#performing-efficient-joins-with-cassandra-tables-since-12 https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md#performing-efficient-joins-with-cassandra-tables-since-12

Spark Cassandra迭代查询

问题描述

1 个解决方案

解决方案1
1 2015-07-12 09:18:08

Spark Cassandra迭代查询

问题描述

1 个解决方案

解决方案1 1 2015-07-12 09:18:08

解决方案1
1 2015-07-12 09:18:08