Cassandra使用Java驱动程序读取大量数据

Question

I have to read 3 TB of production data from a Cassandra database. 我必须从Cassandra数据库读取3 TB的生产数据。

I have implemented paging using java driver but this technique uses offset value which means I am tracing my data all over again to reach a particular row and this process is using heap memory which is not a good practice. 我已经使用Java驱动程序实现了分页，但是该技术使用了偏移值，这意味着我要重新跟踪数据以到达特定的行，并且此过程正在使用堆内存，这不是一个好习惯。 I want to read data without using lots of heap memory 我想在不使用大量堆内存的情况下读取数据

Typically I want to fetch 10000 rows in a batch and then again read next 10000 without reading the first ten thousand reads again 通常，我想批量获取10000行，然后再次读取下一个10000，而不再次读取前一万次读取

I don't need high read latency my only problem is reading data without consuming lots of heap memory... 我不需要高读取延迟，我唯一的问题是在不消耗大量堆内存的情况下读取数据...

here is my code in part Statement select = QueryBuilder.select().all().from("demo", "emp"); 这是我的部分Statement select = QueryBuilder.select().all().from("demo", "emp");代码Statement select = QueryBuilder.select().all().from("demo", "emp");

and this is how i am paging 这就是我寻呼的方式

List<Row> secondPageRows = cassandraPaging.fetchRowsWithPage(select, 100001, 25000);
printUser(secondPageRows);

Where 100001 is the start value from where I want to output my row and 25000 is the size of the page. 其中100001是我要输出行的起始值，25000是页面的大小。 so here I have to first reach till 100000 and then I will print the 100001st value. 所以在这里我必须先达到100000，然后再打印100001st值。 this is causing me the heap problem plus in my case, I don't want to reach at the end of one page to get the first record for another page. 这导致我出现堆问题，而且就我而言，我不想到达一页的末尾以获取另一页的第一条记录。

Answer 1

I can think of 2 possible solution for this: 我可以想到两种可能的解决方案：

1) You need to have a better data model to handle this query. 1）您需要一个更好的数据模型来处理此查询。 Remodel your Table to handle such queries. 重塑表以处理此类查询。

2) Use spark job to handle such request, for this you need to have a separate Data Center to handle this queries so to not have to bother about heap memory. 2）使用spark作业来处理此类请求，为此，您需要一个单独的数据中心来处理此查询，以便不必费心堆内存。

Answer 2

FYI, below document could help although never tried my own. 仅供参考，下面的文档可能会有所帮助，尽管我从未尝试过。

https://docs.datastax.com/en/developer/java-driver/3.6/manual/paging/ https://docs.datastax.com/zh-CN/developer/java-driver/3.6/manual/paging/

Here driver will take care of pagination. 在这里，驾驶员将注意分页。

Cassandra使用Java驱动程序读取大量数据

问题描述

2 个解决方案

解决方案1
0 2018-10-11 11:25:38

解决方案2
0 2018-10-11 14:12:07

Cassandra使用Java驱动程序读取大量数据

问题描述

2 个解决方案

解决方案1 0 2018-10-11 11:25:38

解决方案2 0 2018-10-11 14:12:07

解决方案1
0 2018-10-11 11:25:38

解决方案2
0 2018-10-11 14:12:07