简体繁体中英

How to calculate aggregate values in distributed architecture

原文 2016-09-12 05:31:26 1 1 java/ events/ cluster-computing/ aggregate/ eventaggregator

I have a cluster of Web applications (Java + Tomcat), and the apps generate events. The volume is not that high, but somewhere under 10 million of events per day (unevenly distributed with peaks and valleys).

We need to display calculated aggregates of events on the user interface. Currently, this is done by running DB queries against a large table with many indexes on each page display.

Is there a good architectural approach to keeping a flow of events and also calculating (on the fly) and keeping aggregate numbers, like Average, Mean, Min, Max, etc?

Real time is not important, but near-real time is a must. For instance, a latency of under 1 minute is acceptable.

1 answers

You can go with a push model or a pull model. (Or proactive/reactive if you like those terms better.) In both cases you've got a centralized records-keeper that must aggregate the data you want. In the push model your decentralized services/servers/applications will periodically push updates to your records keeper. In the pull model your records keeper will periodically query your decentralized services and request updates.

In a push scenario, each independent service/server/application keeps a log of their own event counter. Once the event counter ticks over a certain threshold it will notify the records keeper of the new status. For example, they could push an update every 100 or 1000 or delta events. Thus, (assuming there are no undetectable failures) the records keeper always knows how many events have occurred in the system plus or minus your delta. This gives great performance, since whenever someone wants to access the event records all of the data is already aggregated. One downside is that there's a low but persistent overhead imposed on the system. Another is that you never know if a service has failed or whether it just hasn't had a lot of events recently (plus/minus delta).

In the pull scenario your decentralized services still keep logs, but they don't do anything until the records keeper requests an update. When you want to know the state of the system the records keeper must query everyone in the system, get their responses, and assemble the results. This is probably the easiest thing to implement, and one positive aspect is that there is zero system overhead until you actually request an update. The downside is that update requests can cause a big drag on the system when they occur (since everyone drops everything and you generate traffic throughout the entire system). For this same reason it'll take a while to generate updates when the request comes in.

Now, both of these approaches are independent of implementation methodology. Either one of these approaches might be implemented with a completely flat topology, where every service communicates directly with your records keeper. Alternately you might form a hierarchy of services, so that each parent in the hierarchy is responsible for aggregating the data of their children. What you want to do in this respect really depends on exactly how fast an efficient the system needs to be.

Java distributed architecture and versioning

Architecture for distributed data storage

distributed Cache Architecture

How to leverage Spring Integration in a real-world JMS distributed architecture?

Using SHA1 to calculate an aggregate from multiple values

Architecture on AWS : Running a distributed algorithm on dynamic nodes

Implementing Web Crawler on a distributed architecture in Java

Communicate between tomcat instances (Distributed Architecture)

How to sum up/aggregate values on Firebase

How to aggregate values stored inside an array in java

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Java distributed architecture and versioning Architecture for distributed data storage distributed Cache Architecture How to leverage Spring Integration in a real-world JMS distributed architecture? Using SHA1 to calculate an aggregate from multiple values Architecture on AWS : Running a distributed algorithm on dynamic nodes Implementing Web Crawler on a distributed architecture in Java Communicate between tomcat instances (Distributed Architecture) How to sum up/aggregate values on Firebase How to aggregate values stored inside an array in java

Related Tags

How to calculate aggregate values in distributed architecture

Question

1 answers

solution1 2 2016-09-12 06:09:12

solution1
2 2016-09-12 06:09:12