简体   繁体   中英

check in one query if multiple records exist in cassandra

I have a list of Strings "A", "B", "C".

I would like to know how can I check if all these Strings exist in a Cassandra column.

I have two approaches I have previously used for relational databases but I recently moved to Cassandra and I don't know how to achieve this.

The problem is I have about 100 string that I have to check and I don't want to send 100 requests to my database. It wouldn't be wise.

Interesting question... I don't know the schema you're using, but if your strings are in the only PK column (or in a composite PK where the other columns values are known at query time) then you could probably issue 100 queries without worries. The key cache will help not to hit disks, so your could get fast responses.

Instead, if you intend to use this for a column that is not part of any PK, you'll have hard time to figure this out unless you perform some kind of tricks, and this is all subject to some performance restrictions and/or increased code complexity anyway.

As an example, you could build a "frequency" table with the purpose described above, where you store how many times you "saw" each string "A", "B" etc..., and query this table when you need to retrieve the information:

SELECT frequencies FROM freq_table WHERE pk = IN ('A', 'B', 'C');

Then you still need to loop over the result set and check that each record is > 0. An alternative could be to issue a SELECT COUNT(*) before the real query, because you know in advance how many records you should get (eg 3 in my example), but having the correct number of retrieved records could be enough (eg one counter is zero).

Of course you'd need to maintain this table on every insert/update/delete of your main table, raising the complexity of the solution, and of course all the IN clause and COUNT related warning applies...

I would probably stick with 100 queries: with a well designed table they should not be a problem, unless you have an inadequate cluster for the problem size you're dealing with.

CQL gives you the possibility to use IN clause like:

SELECT first_name, last_name FROM emp WHERE empID IN (105, 107, 104);

More information here .

But this approach might not be the best since it can trigger select's across all nodes from the cluster.

So depends very much on how your data is structured.

From this perspective, it might be better to run 100 separate queries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM