如何在Cassandra中使用Spark获取行范围

Question

I have a table in cassandra whose structure is like this 我在卡桑德拉有一张桌子，它的结构是这样的

CREATE TABLE dmp.Table (

pid text PRIMARY KEY,
day_count map<text, int>, 
first_seen map<text, timestamp>, 
last_seen map<text, timestamp>, 
usage_count map<text, int>
}

Now I'm trying to query it using spark-cassandra driver , So is there any where I can get the chunks of data. 现在，我正在尝试使用spark-cassandra驱动程序查询它，那么有什么地方可以获取数据块。 As in if I have 100 rows , I should be able to get 0-10 rows then 10 -20 and so on. 就像我有100行一样，我应该能够得到0-10行，然后是10 -20，依此类推。

 CassandraJavaRDD<CassandraRow> cassandraRDD = CassandraJavaUtil.javaFunctions(javaSparkContext).cassandraTable(keySpaceName, tableName);

I'm asking this as there is no column in my table where I can Query using IN clause to get range of rows. 我问这个问题是因为我的表中没有列，我可以使用IN子句查询以获取行范围。

Answer 1

You can add an auto-incrementing ID coloumn -- see my DataFrame-ified Zip With Index solution. 您可以添加自动递增的ID列-请参阅我的DataFrame定义的带索引的Zip解决方案。 Then you can query by the newly-created id column: 然后，您可以通过新创建的id列进行查询：

SELECT ... WHERE id >= 0 and id < 10;

Etc. 等等。

如何在Cassandra中使用Spark获取行范围

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-03-17 11:19:55

如何在Cassandra中使用Spark获取行范围

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-03-17 11:19:55

解决方案1
1 已采纳 2016-03-17 11:19:55