简体   繁体   中英

How can I run a background thread that cleans up some elements in list regularly?

I am currently implementing cache. I have completed basic implementation, like below. What I want to do is to run a thread that will remove entry that satisfy certain conditions.

class Cache {
    int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
    int maxEntries = 10; //maximum number of Entries
    HashSet<String> set = new HashSet<String>();   
    public void add(Entry t){
        ....
    }

    public Entry access(String key){
        //mark Entry that it has been used
        //Since it has been marked, background thread should remove this entry after timeLimit seconds.
        return set.get(key);
    }
    ....
}

My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ?

edit

Above is just simplified version of codes, that I did not write synchronized statements.

Why are you reinventing the wheel? EhCache (and any decent cache implementation) will do this for you. Also much more lightweight MapMaker Cache from Guava can automatically remove old entries.

If you really want to implement this yourself, it is not really that simple.

  1. Remember about synchronization. You should use ConcurrentHashMap or synchronized keyword to store entries. This might be really tricky.

  2. You must store last access time somehow of each entry somehow. Every time you access an entry, you must update that timestamp.

  3. Think about eviction policy. If there are more than maxEntries in your cache, which ones to remove first?

  4. Do you really need a background thread?

    This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). Instead it waits until the map is full and removes entries lazily. This looks like a good trade-off as threads are expensive.

  5. If you have a background thread, should there be one per cache or one global? Do you start a new thread while creating a new cache or have a global list of all caches? This is harder than you think...

Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry.

First, make access to your collection either synchronized or use ConcurrentHashSet a ConcurrentHashMap based Set as indicated in the comments below.

Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection".

I'd use Guava 's Cache type for this, personally. It's already thread-safe and has methods built in for eviction from the cache based on some time limit. If you want a thread to periodically sweep it, you can just do something like this:

    new Thread(new Runnable() {
        public void run() {
            cache.cleanUp();
            try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
        }
    }).start();

I don't imagine you really need a background thread. Instead you can just remove expired entries before or after you perform a lookup. This simplifies the entire implementation and its very hard to tell the difference.

BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example)

First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.

When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap is not thread-safe, but ConcurrentHashMap is. There are also a number of other concurrent Map implementations out there, namely in Guava , Javolution and high-scale lib . There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReference s.

Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.

However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.

The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.

  • EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
  • Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its ConcurrentMap based API.
  • As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.

Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.

  • Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
  • Stormpot , on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM