简体   繁体   English

Cassandra备份和还原一致性

[英]Cassandra backup and restore consistency

The scenario is as follows: I have 1 Cassandra node in which I have one keyspace, in which I have 2 tables. 场景如下:我有1个Cassandra节点,其中有一个键空间,其中有2个表。 Let's call these tables A and B. Now I have a script that inserts data very quickly into these two tables in a batch statement. 让我们将这些表称为A和B。现在,我有了一个脚本,可以在批处理语句中非常快速地将数据插入这两个表中。 Table A has columns "k" and "value". 表A具有列“ k”和“值”。 Table B has columns "k" and "value". 表B具有列“ k”和“值”。 The batch query is as follows: 批处理查询如下:

BEGIN BATCH
INSERT INTO A(k, value) VALUES ("a", 1);
INSERT INTO B(k, value) VALUES ("b", 1);
APPLY BATCH

The value 1 keeps getting incremented every successive batch query. 值1会在每个连续的批查询中保持递增。 So if table A has (a, 1000), then table B must have (b, 1000). 因此,如果表A具有(a,1000),则表B必须具有(b,1000)。 Because (logged) batch queries are atomic. 因为(记录的)批处理查询是原子的。

Now my question is, how does nodetool snapshot work in this case? 现在我的问题是,在这种情况下,nodetool快照如何工作? I have seen the source code of snapshotting and it seems that it does it per keyspace, per table, one by one. 我已经看过快照的源代码,而且似乎是每个键空间,每个表一个接一个地执行快照的。 So for example, at time 0, it takes a snapshot of table A which has say ("a", 100), then at time 1 a new batch query is inserted (with value 101), and then at time 2, it takes a snapshot of b, which means b's snapshot would have value of 101, but a's wouldn't. 因此,例如,在时间0处,获取表A的快照,快照中说(“ a”,100),然后在时间1处插入新的批查询(值为101),然后在时间2处, b的快照,这意味着b的快照的值为101,而a的快照没有。

If the above explanation is correct, wouldn't that cause a problem while restoring? 如果以上解释正确,那么恢复时不会造成问题吗? How would table A get ("a", 101) after restoring? 恢复后,表A如何获得(“ a”,101)? Or would table B not have ("b", 101) after restoring? 还是还原后表B没有(“ b”,101)?

Firstly, there is a fine line between atomicity and isolation. 首先,原子性和隔离性之间存在细微的界限。 Batches guarantee that both inserts will be applied (atomicity), but they do not guarantee that they will be applied at the exact same time. 分批保证两个刀片都将被应用(原子性),但它们保证将在完全相同的时间应用。 (isolation) (隔离)

A client is still able to read one and not the other in a select. 客户端仍然能够选择一个而不读取另一个。 The only exception to this rule is if the batches targets one single row. 此规则的唯一例外是,如果批次针对的是一行。

You are quite right in that you would run into the problem you described. 您说的很对,因为您会遇到所描述的问题。 It is very possible to get into a scenario where the numbers are not the same in both tables. 在两种表中数字都不相同的情况下,很有可能会出现这种情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM