简体   繁体   中英

Java: Sum two or more time series

I have multiple time series:

       x
|    date    | value |
| 2017-01-01 |   1   |
| 2017-01-05 |   4   |
|     ...    |  ...  |

       y
|    date    | value |
| 2017-01-03 |   3   |
| 2017-01-04 |   2   |
|     ...    |  ...  |

Frustratingly in my dataset there isn't always a matching date in both series. For scenarios where there is one missing I want to use the last available date (or 0 if there isnt one). eg for 2017-01-03 I would use y=3 and x=1 (from the date before) to get output = 3 + 1 = 4

I have each timeseries in the form:

class Timeseries {
    List<Event> x = ...;
}

class Event {
    LocalDate date;
    Double value;
}

and have read them into a List<Timeseries> allSeries

I thought I might be able to sum them using streams

List<TimeSeries> allSeries = ...
Map<LocalDate, Double> byDate = allSeries.stream()
    .flatMap(s -> s.getEvents().stream())
.collect(Collectors.groupingBy(Event::getDate,Collectors.summingDouble(Event::getValue)));

But this wouldnt have my missing date logic I mentioned above.

How else could I achieve this? (It doesnt have to be by streams)

I'd say you need to expand the Timeseries class for the appropriate query function.

class Timeseries {
    private SortedMap<LocalDate, Integer> eventValues = new TreeMap<>();
    private List<Event> eventList;

    public Timeseries(List<Event> events) {
        events.forEach(e -> eventValue.put(e.getDate(), e.getValue());
        eventList=new ArrayList(events);
    }
    public List<Event> getEvents() {
        return Collections.unmodifiableList(eventList);
    }

    public Integer getValueByDate(LocalDate date) {
        Integer value = eventValues.get(date);
        if (value == null) {
            // get values before the requested date
            SortedMap<LocalDate, Integer> head = eventValues.headMap(date);
            value = head.isEmpty()
                ? 0   // none before
                : head.get(head.lastKey());  // first before
        }
        return value;
    }
}

Then to merge

Map<LocalDate, Integer> values = new TreeMap<>();
List<LocalDate> allDates = allSeries.stream().flatMap(s -> s.getEvents().getDate())
    .distinct().collect(toList());

for (LocalDate date : allDates) {
    for (Timeseries series : allSeries) {
        values.merge(date, series.getValueByDate(date), Integer::ad);
    }
}

Edit: actually, the NavigableMap interface is even more useful in this case, it makes the missing data case

Integer value = eventValues.get(date);
if (value == null) {
    Entry<LocalDate, Integer> ceiling = eventValues.ceilingKey(date);
    value = ceiling != null ? eventValues.get(ceiling) : 0;
}

One way to do it to make Event comparable by date and make use of TreeSets floor method:

class Event implements Comparable<Event> {
        // ... 
        @Override
        public int compareTo(Event o) {
            return date.compareTo(o.date);
        }
}

Then in Timeseries class instead of List use TreeSet<Event> x and pad it wit a null entry to make floor return it if there is no previous value:

class Timeseries {
        public static final Event ZERO = new Event(LocalDate.of(1, 1, 1), 0d);
        TreeSet<Event> x = new TreeSet<>(Arrays.asList(ZERO));

        // ...
}

Now collect all known events and calculate the sums:

 TreeSet<Event> events = allSeries.stream()
                .flatMap(s -> s.getEvents().stream()).collect(Collectors.toCollection(TreeSet::new));


 Map<LocalDate, Double> sumsByDate = events.stream().
                map(event -> new AbstractMap.SimpleEntry<>(event.getDate(),
                                                           allSeries.stream().mapToDouble(a -> a.getEvents().floor(event).getValue())
                                                                   .sum())).
                filter(p -> !p.getKey().equals(Timeseries.ZERO.getDate())).
                collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

So I managed to do this partially with streams. It doesnt seem particularily efficient though as you are doing lots of repeated sorting in the getRelevantValueFor method. I would prefer a more efficient solution.

public Timeseries combine(List<Timeseries> allSeries) {

    // Get a unique set of all the dates accross all time series
    Set<LocalDate> allDates = allSeries.stream().flatMap(t -> t.get().stream()).map(Event::getDate).collect(Collectors.toSet());

    Timeseries output = new Timeseries();

    // For each date sum up the latest event in each timeseries
    allDates.forEach(date -> {
        double total = 0;
        for(Timeseries series : allSeries) {
            total += getRelevantValueFor(series, date).orElse(0.0);
        }
        output.add(new Event(date, total));
    });
    return output;
}

private Optional<Double> getRelevantValueFor(Timeseries series, LocalDate date) {
    return series.getEvents().stream().filter(event -> !event.getDate().isAfter(date)).max(ascendingOrder()).map(Event::getValue);
}

private Comparator<Event> ascendingOrder() {
    return (event1, event2) -> {
        long diff = event1.getDate().toEpochMilli() - event2.getDate().toEpochMilli();
        if(diff>0) return 1;
        if(diff<0) return -1;
        return 0;
    };
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM