简体   繁体   中英

Is it possible to get multiple values from an Ignite cache by their keys, applying additional filtering server-side, in one operation?

I have an Ignite cache:

IgniteCache<String, Record> cache;

A collection of keys of this cache is given. I need to do the following:

  1. Get records with the specified keys
  2. ... but additionally filter them by some logic defined dynamically (like 'where field name has value John ')
  3. ... do it as fast as possible
  4. ... under a transaction

One way I tried was using getAll() method and applying filtering on my side:

cache.getAll(keys).values().stream()
        .filter(... filter logic...)
        .collect(toList());

This works, but if the additional filter has high selectivity (ie it rejects a lot of data), we'll waste a lot of time on sending unneeded data via network.

Another option is using a scan:

cache.query(new ScanQuery<>(new IsKeyIn(keys).and(new CustomFilter())))

This makes all the filtering work at the server nodes side, but it is a full scan, and if there are many entries in the cache, while the input keys only constitute a small fraction of it, a lot of time is wasted again, this time on the unneeded scanning.

And there is invokeAll() which allows to filter on the server nodes side:

cache.invokeAll(new TreeSet<>(keys), new AdditionalFilter())
        .values().stream()
        .map(EntryProcessorResult::get)
        .collect(toList());

where

private static class AdditionalFilter implements CacheEntryProcessor<String, Record, Record> {
    @Override
    public Record process(MutableEntry<String, Record> entry,
            Object... arguments) throws EntryProcessorException {
        if (... record matches the filter ...) {
            return entry.getValue();
        }
        return null;
    }
}

It finds entries by their keys, it executes filtering logic at server nodes side, but on my data it is even slower than the scanning solution. I suppose (but not sure) this is due to invokeAll() being possibly an updating operation, so (according to its Javadoc) it takes locks on the corresponding keys.

I would like to have ability to find entries by given keys, apply additional filtering at the server nodes side and not pay for additional locks (as in my case it's a read-only operation).

Is it possible?

My cache is distributed among 3 server nodes, and its atomicity is TRANSACTIONAL_SNAPSHOT . The reads are done under transaction.

  1. SQL is the simplest solution, and possibly the fastest, given proper indexes.

  2. IgniteCompute#broadcast + IgniteCache#localPeek :

Collection<Key> keys = ...;
Collection<Collection<Value>> results = compute.broadcast(new LocalGetter(), keys);

...

    class LocalGetter implements IgniteClosure<Collection<Key>, Collection<Value>>
    {
        @Override public Collection<Value> apply(Collection<Key> keys) {
            IgniteCache<Key, Value> cache = ...;

            Collection<Value> res = new ArrayList<>(keys.size());
            
            for (Key key : keys) {
                Value val = cache.localPeek(key, CachePeekMode.PRIMARY);
                
                if (val != null && filterMatches(val)) {
                    res.add(val);
                }
            }
            
            return res;
        }
    }

This way we retrieve cache entries efficiently by key, then apply the filter locally, and only send matching entries back over the network. There are only N network calls, where N is the number of server nodes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM