如何将数据从 cassandra 表中流出？

Question

I would like to stream data from a cassandra table which is updated in real time.我想从实时更新的 cassandra 表中流式传输数据。 Yes, it is a database but is there a way to do that?是的，它是一个数据库，但有没有办法做到这一点？ If so, keeping an offset or what CQL queries can I use ?如果是这样，保持偏移量或我可以使用哪些 CQL 查询？

Answer 1

Short answer is no.简短的回答是否定的。

Long answer is with a lot of difficulty and smart clustering keys you can maybe do that.长答案是有很多困难和智能集群键，你可以做到这一点。 Basically if you insert data with a clustering key that always increases you can always just scan for clustering keys in a recent time gap.基本上，如果您使用始终增加的集群键插入数据，您总是可以在最近的时间间隔内扫描集群键。 This will of course miss out-of-order inserts outside of your window.这当然会错过窗口外的乱序插入。 This may or may not be good enough for your use case.这对于您的用例来说可能不够好，也可能不够。

Best answer in the future is Change Data Capture: https://issues.apache.org/jira/browse/CASSANDRA-8844未来的最佳答案是变更数据捕获： https : //issues.apache.org/jira/browse/CASSANDRA-8844

Answer 2

I understand you were asking specifically about streaming data out of Cassandra, but I would like to suggest that a technology like Apache Kafka sounds like a much better fit for what you're trying to do.我知道您是专门询问 Cassandra 的流数据，但我想建议像Apache Kafka这样的技术听起来更适合您尝试做的事情。 It is used by a number of other large companies and has fantastic real-time performance.许多其他大公司都在使用它，并且具有出色的实时性能。

There is a seminal blog post by Jay Kreps called The Log: What every software engineer should know about real-time data's unifying abstraction that does a great job of explaining Kafka's purpose and design. Jay Kreps 发表了一篇开创性的博客文章，名为The Log：每个软件工程师都应该了解的关于实时数据统一抽象的知识，它很好地解释了 Kafka 的目的和设计。 A key quote from the blog post summarizes Kafka's role:博文中的关键引述总结了 Kafka 的角色：

Take all the organization's data and put it into a central log for real-time subscription.获取组织的所有数据并将其放入中央日志以进行实时订阅。

Answer 3

To stream the data from Cassandra, you want to use the PageSize option like so:要从 Cassandra 流式传输数据，您需要像这样使用 PageSize 选项：

iter := cass.Query(`SELECT * FROM cmuser.users;`).PageSize(100).Iter()

the above is an example with Golang.以上是 Golang 的一个例子。 The description for PageSize is: PageSize 的描述是：

PageSize will tell the iterator to fetch the result in pages of size n. PageSize 将告诉迭代器在大小为 n 的页面中获取结果。 This is useful for iterating over large result sets, but setting the page size too low might decrease the performance.这对于迭代大型结果集很有用，但将页面大小设置得太低可能会降低性能。 This feature is only available in Cassandra 2 and onwards.此功能仅在 Cassandra 2 及更高版本中可用。

如何将数据从 cassandra 表中流出？

问题描述

3 个解决方案

解决方案1
7 2016-02-29 23:09:12

解决方案2
0 2019-03-22 21:57:45

解决方案3
0

如何将数据从 cassandra 表中流出？

问题描述

3 个解决方案

解决方案1 7 2016-02-29 23:09:12

解决方案2 0 2019-03-22 21:57:45

解决方案3 0

解决方案1
7 2016-02-29 23:09:12

解决方案2
0 2019-03-22 21:57:45

解决方案3
0