简体   繁体   中英

Verification of Caching approach with EhCache, Spring and Hibernate

I have an application that implements a Socket connection to read positional data. Each position relates to an asset. The stream is updating positional data for the many hundreds of assets in real time. Here aa basic class representation of the 2 domain objects:

public class Asset{
    long id;
    Set<Position> positions;
}


public class Position{
    long id;
    Double latitude;
    Double longitude;
    Date timestamp
}

Now, I want the positional data for the last 1 day to be available to a remote client for polling. There will be many hundreds of clients making polling requests for the last day of positional data for each asset. Each asset will be updated with positional data every 5 seconds. The requirement is that the client request may be no more than 10 seconds out of sync with the real time update.

This is putting a huge load on the database - which is where EHCache comes in - maybe...

A nicer alternative (questionable!) would be to configure a cache into which any new Assets and associated Positions would be stored as they are read by the Socket Connection. This cache would expire any Asset that was updated more than a day ago and would be responsible for writing new Assets and Positions to the database periodically (every minute or so). Remote clients would hit the cache for Assets and Positional data.

I just wanted some thoughts / advice on whether this approach might be sensible and also which features of EHCache could facilitate it.

Many thanks

This might be a good use case for EhCache and Hibernate query cache . Here is the basic architecture. First, enable second level cache for Position object. I think it misses the asset field which represents one-to-many relationship, but it doesn't matter.

I assume clients are running query similar to this one:

SELECT p
FROM Position p
WHERE timestamp >= :timestamp
  AND asset = :assert

The :timestamp parameter represents the last 24 hours (current time - 24 hours). You need to enable cacheable query hint on this query.

Here's what happens: the client runs this query indirectly with a pair of (asset, timestamp) parameters. Hibernate tries to find a query result in query cache for this pair which (simplifying) makes up a query cache key.

If the query result is missing, it runs the query and puts results in the query cache. But it only places ids of matching Position instances, not the instances itself. The next time some client asks for the same pair of (asset, timestamp) , Hibernate will find the results in query cache. Then, having only ids, it will search for Position instances in second level cache.

As you can see this scenario is quite complex and several factors will influence the overall success:

  • how many distinct pairs of (asset, timestamp) are there? Roughly:

.

86400 (number of different seconds in a day) times 
  number of different assets 

keys and values must fit in cache. Remember that each value is a list of Position ids. That's a lot of memory. You can probably cut this by limiting the number of different timestamps. Is last 24 hours such important? Can that be between 23 and 24 hours or such? This way you can round timestamps and decrease the size of the key space (cache size).

  • All Position instances references by id from query cache must fit in cache. This might be huge. But otherwise you will hit the N + 1 problem, since in the absence of Position entity in L2 cache while fetching from query cache, Hibernate will perform implicit query by id from the database.

  • Cache invalidation is performed by Hibernate. However remember that each insert to Position table will invalidate query cache. In your circumstances fully warming up the cache might never be possible.

That being said, you should give a try to query caching in Hibernate. However query cache is very tricky and requires a lot of tuning to implement correctly.

Tip: if you are running out of memory, try overflowing EhCache to disk. Never tried it, but I belive fetching serialized cache values from local disk (especially when SSD is used) might be much faster than cache miss and database query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM