简体   繁体   English

如何将数据从 cassandra 表中流出?

[英]how to stream data out of a cassandra table?

I would like to stream data from a cassandra table which is updated in real time.我想从实时更新的 cassandra 表中流式传输数据。 Yes, it is a database but is there a way to do that?是的,它是一个数据库,但有没有办法做到这一点? If so, keeping an offset or what CQL queries can I use ?如果是这样,保持偏移量或我可以使用哪些 CQL 查询?

Short answer is no.简短的回答是否定的。

Long answer is with a lot of difficulty and smart clustering keys you can maybe do that.长答案是有很多困难和智能集群键,你可以做到这一点。 Basically if you insert data with a clustering key that always increases you can always just scan for clustering keys in a recent time gap.基本上,如果您使用始终增加的集群键插入数据,您总是可以在最近的时间间隔内扫描集群键。 This will of course miss out-of-order inserts outside of your window.这当然会错过窗口外的乱序插入。 This may or may not be good enough for your use case.这对于您的用例来说可能不够好,也可能不够。

Best answer in the future is Change Data Capture: https://issues.apache.org/jira/browse/CASSANDRA-8844未来的最佳答案是变更数据捕获: https : //issues.apache.org/jira/browse/CASSANDRA-8844

I understand you were asking specifically about streaming data out of Cassandra, but I would like to suggest that a technology like Apache Kafka sounds like a much better fit for what you're trying to do.我知道您是专门询问 Cassandra 的流数据,但我想建议像Apache Kafka这样的技术听起来更适合您尝试做的事情。 It is used by a number of other large companies and has fantastic real-time performance.许多其他大公司都在使用它并且具有出色的实时性能。

There is a seminal blog post by Jay Kreps called The Log: What every software engineer should know about real-time data's unifying abstraction that does a great job of explaining Kafka's purpose and design. Jay Kreps 发表了一篇开创性的博客文章,名为The Log:每个软件工程师都应该了解的关于实时数据统一抽象的知识,它很好地解释了 Kafka 的目的和设计。 A key quote from the blog post summarizes Kafka's role:博文中的关键引述总结了 Kafka 的角色:

Take all the organization's data and put it into a central log for real-time subscription.获取组织的所有数据并将其放入中央日志以进行实时订阅。

To stream the data from Cassandra, you want to use the PageSize option like so:要从 Cassandra 流式传输数据,您需要像这样使用 PageSize 选项:

iter := cass.Query(`SELECT * FROM cmuser.users;`).PageSize(100).Iter()

the above is an example with Golang.以上是 Golang 的一个例子。 The description for PageSize is: PageSize 的描述是:

PageSize will tell the iterator to fetch the result in pages of size n. PageSize 将告诉迭代器在大小为 n 的页面中获取结果。 This is useful for iterating over large result sets, but setting the page size too low might decrease the performance.这对于迭代大型结果集很有用,但将页面大小设置得太低可能会降低性能。 This feature is only available in Cassandra 2 and onwards.此功能仅在 Cassandra 2 及更高版本中可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM