简体   繁体   中英

High number of tombstones with TTL columns in Cassandra

I have a cassandra Column Family, or CQL table with the following schema:

CREATE TABLE user_actions (
    company_id varchar,
    employee_id varchar,
    inserted_at timeuuid,
    action_type varchar,
    PRIMARY KEY ((company_id, employee_id), inserted_at)
) WITH CLUSTERING ORDER BY (inserted_at DESC);

Basically a composite partition key that is made up of a company ID and an employee ID, and a clustering column, representing the insertion time, that is used to order the columns in reverse chronological order (newest actions are at the beginning of the row).

Here's what an insert looks like:

INSERT INTO user_actions (company_id, employee_id, inserted_at, action_type)
VALUES ('acme', 'xyz', now(), 'started_project')
USING TTL 1209600; // two weeks

Nothing special here, except the TTL which is set to expire in two weeks.

The read path is also quite simple - we always want the latest 100 actions, so it looks like this:

SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
LIMIT 100;

The issue: I would expect that since we order in reverse chronological order, and the TTL is always the same amount of seconds on insertion - that such a query should not scan through any tombstones - all "dead" columns are at the tail of the row, not the head. But in practice we see many warnings in the log in the following format:

WARN [ReadStage:60452] 2014-09-08 09:48:51,259 SliceQueryFilter.java (line 225) Read 40 live and 1164 tombstoned cells in profiles.user_actions (see tombstone_warn_threshold). 100 columns was requested, slices=[-], delInfo={deletedAt=1410169639669000, localDeletion=1410169639}

and on rare occasions the tombstone number is large enough to abort the query completely. Since I see this type of schema design being advocated quite often, I wonder if I'm doing something wrong here?

Your SELECT statement is not giving an explicit sort order and is hence defaulting to ASC (even though your clustering order is DESC).

So if you change your query to:

SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
ORDER BY inserted_at DESC
LIMIT 100;

you should be fine

Perhaps data is reappearing because a node fails and gc_grace_seconds expired already, the node comes back into the cluster, and Cassandra can't replay/repair updates because the tombstone disappeared after gc_grace_seconds: http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html

The 2.1 incremental repair sounds like it might be right for you: http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM