简体   繁体   English

在Cassandra中执行删除操作时获取数据库条目

[英]Getting database entry when performing delete operation in Cassandra

I have a web service that is maintaining the state of a "request". 我有一个维持“请求”状态的Web服务。 The possible states are "Active" and "InActive". 可能的状态为“有效”和“无效”。 I am storing the request information in a Cassandra DB. 我将请求信息存储在Cassandra DB中。 I have two tables - one for Active requests and another for InActive Requests. 我有两个表-一个用于活动请求,另一个用于非活动请求。 They both have the same schema. 它们都具有相同的架构。

My schema is as follows: 我的架构如下:

ActiveRequests{
  UserId text,
  RequestId int,
  RequestData text
  PRIMARY KEY(UserId, RequestId)
}

I need to implement an API that will move a request from the Active state to the InActive state. 我需要实现一个将请求从“活动”状态移到“非活动”状态的API。 I plan on doing this by deleting the entry from the Active table and then adding the removed entry to the InActive table. 我计划通过从Active表中删除条目,然后将删除的条目添加到InActive表中来做到这一点。

In Cassandra it seems like a DELETE operation doesn't actually return the data that was deleted. 在Cassandra中,似乎DELETE操作实际上并不返回已删除的数据。 So, I have to do a SELECT on the request entry(so that I can get all the request data for adding to the InActive table) and then do a DELETE operation. 因此,我必须对请求条目执行SELECT (以便可以获取所有请求数据以添加到InActive表中),然后执行DELETE操作。 Is there a better way to do this? 有一个更好的方法吗?

EDIT 编辑

You may ask why I am maintaining Active and InActive requests as separate tables. 您可能会问为什么我将Active和InActive请求作为单独的表进行维护。 I could potentially combine them into a single table and have an IsActive column. 我可以将它们组合到一个表中,并具有IsActive列。 My reasoning for maintaining separate tables is as follows: 我维护单独表的理由如下:

I want my queries to the Active Table to be very quick. 我希望对活动表的查询很快。 If I want to query all the Active requests in a table that has both Active and InActive requests that won't be as optimal. 如果我要查询具有活动和非活动请求的表中的所有活动请求,这些请求都不是最佳选择。 The partitionKey is userId and I expect the InActive table to have several 1000 requestIds for a given UserId. partitionKey是userId,我希望InActive表对于给定的UserId具有几个1000 requestIds。 But, Active should only have 10 or more requestIds per UserId. 但是,Active每个用户ID只能有10个或更多requestId。

The basic answer to having DELETE return the data is that it really isn't something Cassandra can do. DELETE返回数据的基本答案是,Cassandra确实无法做到这一点。 A delete in Cassandra is actually a write of a tombstone. 在Cassandra中删除实际上是写一个墓碑。 Cassandra in general will not do reads before writes and needing that is actually considered an anti-pattern. 通常,Cassandra不会在写入之前进行读取,而实际上需要将其视为反模式。

Another thing to remember is a delete in Cassandra means the data doesn't leave the system until sometime after your GC Grace settings for that table. 要记住的另一件事是,Cassandra中的删除意味着该数据直到该表的GC Grace设置之后的某个时间才会离开系统。

Are these requests at all time based? 这些请求是否始终基于? If they are you could think about bucketing the requests. 如果是这样,您可以考虑对请求进行分类。 So you would have a single table something like: 因此,您将只有一个表,例如:

Requests{
  UserId text,
  TimeBucket text,
  RequestId int,
  RequestData text,
  Active boolean,
  PRIMARY KEY((UserId, TimeBucket) RequestId)
}

The time buckets could be per hour or minute what ever makes sense for your use case. 时间段可能是每小时或每分钟适合您的用例。 You can then work through the given buckets with different selects. 然后,您可以使用不同的选择处理给定的存储桶。 This will keep you from having too many requests for a given partition key. 这样可以避免您对给定分区键的请求过多。 The assumption is the timebucket is big enough to cover most of the active requests so you end up not needing to also look at all the buckets. 假设时间桶足够大,可以覆盖大多数活动请求,因此您最终不必查看所有存储桶。

I'm also not sure how long you plan to keep records if they are kept for long periods of time or forever this bucketing will make sure you don't end up with overly big partitions which could end up happening in the InActive table with the other setup. 我也不确定如果将记录保留很长时间或永久保存,您打算将记录保留多长时间,以确保您不会出现过大的分区,而该分区可能最终会在InActive表中发生其他设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM