I have multiple time series:
x
| date | value |
| 2017-01-01 | 1 |
| 2017-01-05 | 4 |
| ... | ... |
y
| date | value |
| 2017-01-03 | 3 |
| 2017-01-04 | 2 |
| ... | ... |
Frustratingly in my dataset there isn't always a matching date in both series. For scenarios where there is one missing I want to use the last available date (or 0 if there isnt one). eg for 2017-01-03
I would use y=3
and x=1
(from the date before) to get output = 3 + 1 = 4
I have each timeseries in the form:
class Timeseries {
List<Event> x = ...;
}
class Event {
LocalDate date;
Double value;
}
and have read them into a List<Timeseries> allSeries
I thought I might be able to sum them using streams
List<TimeSeries> allSeries = ...
Map<LocalDate, Double> byDate = allSeries.stream()
.flatMap(s -> s.getEvents().stream())
.collect(Collectors.groupingBy(Event::getDate,Collectors.summingDouble(Event::getValue)));
But this wouldnt have my missing date logic I mentioned above.
How else could I achieve this? (It doesnt have to be by streams)
I'd say you need to expand the Timeseries class for the appropriate query function.
class Timeseries {
private SortedMap<LocalDate, Integer> eventValues = new TreeMap<>();
private List<Event> eventList;
public Timeseries(List<Event> events) {
events.forEach(e -> eventValue.put(e.getDate(), e.getValue());
eventList=new ArrayList(events);
}
public List<Event> getEvents() {
return Collections.unmodifiableList(eventList);
}
public Integer getValueByDate(LocalDate date) {
Integer value = eventValues.get(date);
if (value == null) {
// get values before the requested date
SortedMap<LocalDate, Integer> head = eventValues.headMap(date);
value = head.isEmpty()
? 0 // none before
: head.get(head.lastKey()); // first before
}
return value;
}
}
Then to merge
Map<LocalDate, Integer> values = new TreeMap<>();
List<LocalDate> allDates = allSeries.stream().flatMap(s -> s.getEvents().getDate())
.distinct().collect(toList());
for (LocalDate date : allDates) {
for (Timeseries series : allSeries) {
values.merge(date, series.getValueByDate(date), Integer::ad);
}
}
Edit: actually, the NavigableMap
interface is even more useful in this case, it makes the missing data case
Integer value = eventValues.get(date);
if (value == null) {
Entry<LocalDate, Integer> ceiling = eventValues.ceilingKey(date);
value = ceiling != null ? eventValues.get(ceiling) : 0;
}
One way to do it to make Event comparable by date and make use of TreeSets floor
method:
class Event implements Comparable<Event> {
// ...
@Override
public int compareTo(Event o) {
return date.compareTo(o.date);
}
}
Then in Timeseries class instead of List use TreeSet<Event> x
and pad it wit a null entry to make floor
return it if there is no previous value:
class Timeseries {
public static final Event ZERO = new Event(LocalDate.of(1, 1, 1), 0d);
TreeSet<Event> x = new TreeSet<>(Arrays.asList(ZERO));
// ...
}
Now collect all known events and calculate the sums:
TreeSet<Event> events = allSeries.stream()
.flatMap(s -> s.getEvents().stream()).collect(Collectors.toCollection(TreeSet::new));
Map<LocalDate, Double> sumsByDate = events.stream().
map(event -> new AbstractMap.SimpleEntry<>(event.getDate(),
allSeries.stream().mapToDouble(a -> a.getEvents().floor(event).getValue())
.sum())).
filter(p -> !p.getKey().equals(Timeseries.ZERO.getDate())).
collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
So I managed to do this partially with streams. It doesnt seem particularily efficient though as you are doing lots of repeated sorting in the getRelevantValueFor
method. I would prefer a more efficient solution.
public Timeseries combine(List<Timeseries> allSeries) {
// Get a unique set of all the dates accross all time series
Set<LocalDate> allDates = allSeries.stream().flatMap(t -> t.get().stream()).map(Event::getDate).collect(Collectors.toSet());
Timeseries output = new Timeseries();
// For each date sum up the latest event in each timeseries
allDates.forEach(date -> {
double total = 0;
for(Timeseries series : allSeries) {
total += getRelevantValueFor(series, date).orElse(0.0);
}
output.add(new Event(date, total));
});
return output;
}
private Optional<Double> getRelevantValueFor(Timeseries series, LocalDate date) {
return series.getEvents().stream().filter(event -> !event.getDate().isAfter(date)).max(ascendingOrder()).map(Event::getValue);
}
private Comparator<Event> ascendingOrder() {
return (event1, event2) -> {
long diff = event1.getDate().toEpochMilli() - event2.getDate().toEpochMilli();
if(diff>0) return 1;
if(diff<0) return -1;
return 0;
};
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.