Apache Cassandra 在重读负载下崩溃

Question

I've been working on an application which requires regular writes and massive reads at once.我一直在开发一个需要同时进行定期写入和大量读取的应用程序。

The application is storing a few text columns which are not very big in size and a map of which is the biggest column in the table.该应用程序正在存储一些大小不是很大的文本列和一张地图，其中最大的列是表中的。

Working with Phantom-DSL in Scala (Datastax Java driver underneath), my application crashes when the data size increases.在 Scala 中使用 Phantom-DSL（下面的 Datastax Java 驱动程序），当数据大小增加时，我的应用程序崩溃。

Here is a log from my application.这是我的应用程序的日志。

[error] - com.websudos.phantom - All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.OperationTimedOutException: [/127.0.0.1:9042] Operation timed out))

And here are the cassandra logs .这是cassandra 日志。

I have posted Cassandra logs in a pastebin because they were pretty large to be embedded in the answers.我在粘贴箱中发布了 Cassandra 日志，因为它们非常大，无法嵌入到答案中。

I can't seem to understand the reason for this crash.我似乎无法理解这次崩溃的原因。 I have tried increasing the timeout and turning off row cache.我尝试增加超时并关闭行缓存。

From what I understand, this is a basic problem and can be resolved by tuning cassandra for this special case.据我了解，这是一个基本问题，可以通过针对这种特殊情况调整 cassandra 来解决。

My cassandra usage is coming from different data sources.我的 cassandra 使用来自不同的数据源。 So writes are not very frequent.所以写入不是很频繁。 But reads are big in size in that over 300K rows may be required at once which then need to be transferred over HTTP.但是读取的规模很大，因为一次可能需要超过 30 万行，然后需要通过 HTTP 传输。

Answer 1

The logs show significant GC pressure (ParNew of 5 seconds).日志显示了显着的 GC 压力（ParNew 为 5 秒）。

When you say "reads are big in size in that over 300K rows may be required at once", do you mean you're pulling 300k rows in a single query?当您说“读取量很大，一次可能需要超过 30 万行”时，您的意思是您要在单个查询中提取 30 万行吗？ The Datastax driver supports native paging - set the fetch size significantly lower (500 or 1000), and allow it to page through those queries instead of trying to load all 300k rows in a single pass? Datastax 驱动程序支持本机分页 - 将提取大小设置得显着降低（500 或 1000），并允许它对这些查询进行分页，而不是尝试一次性加载所有 300k 行？

Answer 2

Maps (and collections in general) can be very demanding for Cassandra heapspace.映射（以及一般的集合）对 Cassandra 堆空间的要求非常高。 Changing your data model to replace the map with another table may solve your gc issues.更改数据模型以用另一个表替换地图可能会解决您的 gc 问题。 But that's a lot of speculation due to the lack of further details on your Cassandra usage.但由于缺乏有关您的 Cassandra 使用的更多详细信息，因此有很多猜测。

Apache Cassandra 在重读负载下崩溃

问题描述

2 个解决方案

解决方案1
1 2015-10-12 21:10:01

解决方案2
0 2015-10-12 10:56:45

Apache Cassandra 在重读负载下崩溃

问题描述

2 个解决方案

解决方案1 1 2015-10-12 21:10:01

解决方案2 0 2015-10-12 10:56:45

解决方案1
1 2015-10-12 21:10:01

解决方案2
0 2015-10-12 10:56:45