简体   繁体   English

在一个查询中检查cassandra中是否存在多个记录

[英]check in one query if multiple records exist in cassandra

I have a list of Strings "A", "B", "C". 我有一个字符串“ A”,“ B”,“ C”的列表。

I would like to know how can I check if all these Strings exist in a Cassandra column. 我想知道如何检查所有这些字符串是否都存在于Cassandra列中。

I have two approaches I have previously used for relational databases but I recently moved to Cassandra and I don't know how to achieve this. 我以前有两种方法用于关系数据库,但是最近我搬到了Cassandra,但我不知道该如何实现。

The problem is I have about 100 string that I have to check and I don't want to send 100 requests to my database. 问题是我有大约100个字符串需要检查,并且我不想向数据库发送100个请求。 It wouldn't be wise. 这不是明智的。

Interesting question... I don't know the schema you're using, but if your strings are in the only PK column (or in a composite PK where the other columns values are known at query time) then you could probably issue 100 queries without worries. 有趣的问题...我不知道您正在使用的架构,但是如果您的字符串在唯一的PK列中(或在查询时知道其他列值的复合PK中),则可能会发出100查询无后顾之忧。 The key cache will help not to hit disks, so your could get fast responses. 密钥缓存将帮助您避免命中磁盘,因此您可以获得快速响应。

Instead, if you intend to use this for a column that is not part of any PK, you'll have hard time to figure this out unless you perform some kind of tricks, and this is all subject to some performance restrictions and/or increased code complexity anyway. 相反,如果您打算将其用于不属于任何PK的列,则除非您执行某种技巧,否则您将很难找到答案,并且这会受到一些性能限制和/或提高。反正代码复杂度。

As an example, you could build a "frequency" table with the purpose described above, where you store how many times you "saw" each string "A", "B" etc..., and query this table when you need to retrieve the information: 例如,您可以出于上述目的构建“频率”表,在其中存储“看到”每个字符串“ A”,“ B”等的次数,并在需要时查询该表。检索信息:

SELECT frequencies FROM freq_table WHERE pk = IN ('A', 'B', 'C');

Then you still need to loop over the result set and check that each record is > 0. An alternative could be to issue a SELECT COUNT(*) before the real query, because you know in advance how many records you should get (eg 3 in my example), but having the correct number of retrieved records could be enough (eg one counter is zero). 然后,您仍然需要遍历结果集并检查每条记录>0。另一种方法是在实际查询之前发出SELECT COUNT(*) ,因为您事先知道应该获得多少条记录(例如3在我的示例中),但是具有正确数量的检索记录就足够了(例如,一个计数器为零)。

Of course you'd need to maintain this table on every insert/update/delete of your main table, raising the complexity of the solution, and of course all the IN clause and COUNT related warning applies... 当然,您需要在主表的每个插入/更新/删除操作中维护此表,从而提高解决方案的复杂性,当然,所有与IN子句和COUNT相关的警告都适用...

I would probably stick with 100 queries: with a well designed table they should not be a problem, unless you have an inadequate cluster for the problem size you're dealing with. 我可能会坚持100个查询:使用设计合理的表,它们应该不是问题,除非您没有足够的集群来解决要解决的问题。

CQL gives you the possibility to use IN clause like: CQL使您可以使用IN子句,例如:

SELECT first_name, last_name FROM emp WHERE empID IN (105, 107, 104);

More information here . 更多信息在这里

But this approach might not be the best since it can trigger select's across all nodes from the cluster. 但是这种方法可能不是最佳方法,因为它可以触发集群中所有节点的选择。

So depends very much on how your data is structured. 因此,非常取决于您的数据的结构。

From this perspective, it might be better to run 100 separate queries. 从这个角度来看,最好运行100个独立的查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM