简体   繁体   中英

Java time ordered data structure

I am processing a high frequency stream of timestamped events with no ordering guarantee (ordered 90% of the time). I need to store these events (for caching purpose) for some times in my program. In order to optimize the performances of my computations (that require mainly iteration over the events collection) it would be much more easier for me if I could guarantee the order by caching an ordered list. So what I am looking for is an ordered data structure that is fast in insertion and iteration and allow duplicates.

Among all propositions I have found on internet, I have tried :
- TreeSet -> Does not work because I might have duplicate timestamp
- PriorityQueue -> Does not work because the iterator does not guarantee priority order

Since 9/10 events are well ordered I thought I could use a basic ArrayList with a modified version of the add method :

public class TimeOrderedArrayList<E> extends ArrayList<E>{

private long lastTs;
private Comparator<E> comparator;
private TimeGetter<E> tsgetter;

public TimeOrderedArrayList (Comparator<E> comparator, TimeGetter<E> tsgetter) {
    super();
    this.comparator = comparator;
    this.tsgetter = tsgetter;
    this.lastTs = Long.MIN_VALUE;
}


@Override
public boolean add(E e) {
    if (tsgetter.getTime(e) >= lastTs) {
        lastTs = tsgetter.getTime(e);
        return super.add(e);
    } else {

        // VERSION 1
        int index = super.size()-1;
        while (tsgetter.getTime(super.get(index))>tsgetter.getTime(e) && index > 0) {
            index--;
        }
        super.add(index, e);

        // VERSION 2
        int index = Collections.binarySearch(this, e, comparator);
        super.add(index>-1 ? index : -index-1,e);
        return true;
    }
}

@Override
public boolean addAll(Collection<? extends E> c) {
    boolean result = super.addAll(c);
    super.sort(comparator);
    return result;
}
}

But for both version I get really bad performances.

Any suggestions ?

I know you have dismissed it already but I would still suggest TreeSet .

There is actually no problem in having duplicate timestamps. The only condition is that the comparator is consistent with equals . Nothing more.

So yes, in the first approach if you just compare timestamps of events then that would probably not consistent with equals. But if you also compare other fields of your events then this will be consistent with equals.

This assumes, of course, that the timestamp is a part of E , the event class.

From the problem description, it appears to me that a strict order is not mandatory for the problem as long as you can have iteration over the events collection over a period of time. Also, the kind of data you mention seems to be one where multiple client nodes are sending data to one centralized server (could be logs/events accumulation from multiple services).

If that is the case, you could explore using a simple array of buckets, where an event corresponding to a timestamp goes into a specific bucket only. You will ensure that all events which have very near timestamps are classified into the same buckets, so that you could achieve partial order between the events.

For Example: If you need data for last 1 minute (60 seconds), you could define 60 buckets, one for each second, and keep rotating them. An event timestamped 2016-12-08 19:59:29.538331 goes to the 29th bucket (assuming indices start from 0, and you take a floor of the seconds of every event). When a minute lapses, simply clear away past data for the ith bucket, and start building it afresh. So, at 2016-12-08 20:00:00.129845 , the 0th bucket is reset to an empty array.

Since you have a high frequency stream of timestamped events, chances of empty buckets etc will be minimal. You can tweak around the number of buckets required based on your exact requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM