简体   繁体   中英

How to calculate aggregate values in distributed architecture

I have a cluster of Web applications (Java + Tomcat), and the apps generate events. The volume is not that high, but somewhere under 10 million of events per day (unevenly distributed with peaks and valleys).

We need to display calculated aggregates of events on the user interface. Currently, this is done by running DB queries against a large table with many indexes on each page display.

Is there a good architectural approach to keeping a flow of events and also calculating (on the fly) and keeping aggregate numbers, like Average, Mean, Min, Max, etc?

Real time is not important, but near-real time is a must. For instance, a latency of under 1 minute is acceptable.

You can go with a push model or a pull model. (Or proactive/reactive if you like those terms better.) In both cases you've got a centralized records-keeper that must aggregate the data you want. In the push model your decentralized services/servers/applications will periodically push updates to your records keeper. In the pull model your records keeper will periodically query your decentralized services and request updates.

In a push scenario, each independent service/server/application keeps a log of their own event counter. Once the event counter ticks over a certain threshold it will notify the records keeper of the new status. For example, they could push an update every 100 or 1000 or delta events. Thus, (assuming there are no undetectable failures) the records keeper always knows how many events have occurred in the system plus or minus your delta. This gives great performance, since whenever someone wants to access the event records all of the data is already aggregated. One downside is that there's a low but persistent overhead imposed on the system. Another is that you never know if a service has failed or whether it just hasn't had a lot of events recently (plus/minus delta).

In the pull scenario your decentralized services still keep logs, but they don't do anything until the records keeper requests an update. When you want to know the state of the system the records keeper must query everyone in the system, get their responses, and assemble the results. This is probably the easiest thing to implement, and one positive aspect is that there is zero system overhead until you actually request an update. The downside is that update requests can cause a big drag on the system when they occur (since everyone drops everything and you generate traffic throughout the entire system). For this same reason it'll take a while to generate updates when the request comes in.

Now, both of these approaches are independent of implementation methodology. Either one of these approaches might be implemented with a completely flat topology, where every service communicates directly with your records keeper. Alternately you might form a hierarchy of services, so that each parent in the hierarchy is responsible for aggregating the data of their children. What you want to do in this respect really depends on exactly how fast an efficient the system needs to be.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM