简体   繁体   English

如何运行后台线程定期清理列表中的某些元素?

[英]How can I run a background thread that cleans up some elements in list regularly?

I am currently implementing cache. 我目前正在实施缓存。 I have completed basic implementation, like below. 我已完成基本实现,如下所示。 What I want to do is to run a thread that will remove entry that satisfy certain conditions. 我想要做的是运行一个线程,删除满足特定条件的条目。

class Cache {
    int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
    int maxEntries = 10; //maximum number of Entries
    HashSet<String> set = new HashSet<String>();   
    public void add(Entry t){
        ....
    }

    public Entry access(String key){
        //mark Entry that it has been used
        //Since it has been marked, background thread should remove this entry after timeLimit seconds.
        return set.get(key);
    }
    ....
}

My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ? 我的问题是,我应该如何实现后台线程,以便线程绕过集合中的条目并删除已marked && (last access time - now)>timeLimit

edit 编辑

Above is just simplified version of codes, that I did not write synchronized statements. 上面只是代码的简化版本,我没有写同步语句。

Why are you reinventing the wheel? 你为什么重新发明轮子? EhCache (and any decent cache implementation) will do this for you. EhCache (以及任何体面的缓存实现)都会为您完成此任务。 Also much more lightweight 也更轻巧 MapMaker Cache from Guava can automatically remove old entries. 来自Guava的 Cache可以自动删除旧条目。

If you really want to implement this yourself, it is not really that simple. 如果你真的想自己实现它,那就不是那么简单了。

  1. Remember about synchronization. 记住同步。 You should use ConcurrentHashMap or synchronized keyword to store entries. 您应该使用ConcurrentHashMapsynchronized关键字来存储条目。 This might be really tricky. 这可能真的很棘手。

  2. You must store last access time somehow of each entry somehow. 你必须以某种方式存储每个条目的最后访问时间。 Every time you access an entry, you must update that timestamp. 每次访问条目时,都必须更新该时间戳。

  3. Think about eviction policy. 想想驱逐政策。 If there are more than maxEntries in your cache, which ones to remove first? 如果缓存中有多个maxEntries ,首先要删除哪些?

  4. Do you really need a background thread? 你真的需要一个后台线程吗?

    This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). 这是令人惊讶的,但EhCache(企业就绪且经过验证)不使用后台线程来使旧条目无效)。 Instead it waits until the map is full and removes entries lazily. 相反,它会等到地图已满并懒惰地删除条目。 This looks like a good trade-off as threads are expensive. 这看起来是一个很好的权衡,因为线程很昂贵。

  5. If you have a background thread, should there be one per cache or one global? 如果你有一个后台线程,那么每个缓存还是一个全局? Do you start a new thread while creating a new cache or have a global list of all caches? 您是在创建新缓存时启动新线程还是拥有所有缓存的全局列表? This is harder than you think... 这比你想象的要难......

Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry. 一旦你回答了所有这些问题,实现就相当简单了:每隔一秒左右查看所有条目,如果满足你已编写的条件,则删除条目。

First, make access to your collection either synchronized or use 首先,要synchronized或使用对集合的访问权限 ConcurrentHashSet a ConcurrentHashMap based Set as indicated in the comments below. 基于ConcurrentHashMapSet如下面的注释所示。

Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. 其次,编写新线程,并将其实现为无限循环,定期迭代先前的集合并删除元素。 You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection". 您应该以在构造函数中使用正确的集合初始化它的方式编写此类,这样您就不必担心“如何访问正确的集合”。

I'd use Guava 's Cache type for this, personally. 我个人使用GuavaCache类型。 It's already thread-safe and has methods built in for eviction from the cache based on some time limit. 它已经是线程安全的,并且内置了一些方法,可以根据时间限制从缓存中逐出。 If you want a thread to periodically sweep it, you can just do something like this: 如果你想要一个线程定期扫描它,你可以这样做:

    new Thread(new Runnable() {
        public void run() {
            cache.cleanUp();
            try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
        }
    }).start();

I don't imagine you really need a background thread. 我不认为你真的需要一个后台线程。 Instead you can just remove expired entries before or after you perform a lookup. 相反,您可以在执行查找之前或之后删除过期的条目。 This simplifies the entire implementation and its very hard to tell the difference. 这简化了整个实现,很难区分。

BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example) 顺便说一句:如果使用LinkedHashMap,可以通过覆盖removeEldestEntry将其用作LRU缓存(有关示例,请参阅其javadoc)

First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." 首先,你呈现的代码是不完整的,因为HashSet上没有get(key) (所以我假设你的意思是某种Map )并且你的代码没有提到任何“标记”。 There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why. 还有很多方法可以进行缓存,如果不知道要缓存的内容以及原因,很难找出最佳解决方案。

When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. 实现缓存时,通常假设数据结构将由多个线程同时访问。 So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. 因此,您需要做的第一件事是使用线程安全的后备数据结构。 HashMap is not thread-safe, but ConcurrentHashMap is. HashMap不是线程安全的,但是ConcurrentHashMap是。 There are also a number of other concurrent Map implementations out there, namely in Guava , Javolution and high-scale lib . 还有许多其他并发Map实现,即GuavaJavolution高级lib There are other ways to build caches besides maps, and their usefulness depends on your use case. 除了地图之外,还有其他方法可以构建缓存,它们的用处取决于您的用例。 Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. 无论如何,即使您决定不需要后台线程,而且在尝试从缓存中检索过期对象时,也很可能需要使后备数据结构成为线程安全的。 Or letting the GC remove the entries by using SoftReference s. 或者让GC使用SoftReference删除条目。

Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. 一旦你的高速缓存的内部线程安全,你可以简单地启动一个新的(很可能是守护进程)线程,定期扫描/迭代缓存并删除旧条目。 The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep. 线程将在循环中执行此操作(直到被中断,如果您希望能够再次停止它),然后在每次扫描后休眠一段时间。

However, you should consider whether it is worth it for you, to build your own cache implementation. 但是,您应该考虑构建自己的缓存实现是否值得。 Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. 编写线程安全代码并不容易,我建议您在尝试编写自己的缓存实现之前先研究它。 I can recommend the book Java Concurrency in Practice. 我可以推荐Java Concurrency in Practice一书。

The easier way to go about this is, of course, to use an existing cache implementation. 当然,更简单的方法是使用现有的缓存实现。 There are many options available in Java-land, all with their own unique set of trade-offs. Java-land中有许多选项,都有自己独特的权衡取舍。

  • EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application. EhCacheJCS都是通用缓存,适合大多数缓存需求,可以在典型的“企业”应用程序中找到。
  • Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. Infinispan是一种针对分布式使用进行了优化的缓存,因此可以缓存比单个机器上的数据更多的数据。 I also like its ConcurrentMap based API. 我也喜欢它的基于ConcurrentMap的API。
  • As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches. 正如其他人所提到的,Googles Guava库有一个Cache API,对于小型内存缓存非常有用。

Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache. 由于您希望限制缓存中的条目数,因此您可能对对象池而不是缓存感兴趣。

  • Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself. Apache Commons-Pool被广泛使用,其API类似于您自己构建的API。
  • Stormpot , on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. 另一方面, Stormpot有一个相当不同的API,我几乎只提到它,因为我写了它。 It's probably not what you want, but who can be sure without knowing what you are trying to cache and why? 它可能不是你想要的,但是谁能确定你不知道你要缓存什么以及为什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM