简体繁体 English

创建聚合表的最佳方法是什么？

[英]What is the best approach to create aggregation tables?

原文 2013-07-14 07:37:41 5 2 java/ aggregate/ hsqldb

I have data being collected every 1 sec and stored in hsqlDB. 我每1秒钟收集一次数据，并将其存储在hsqlDB中。 I need to have aggregation data (per 15 sec, 1 min etc) on each metrics in the data collected. 我需要在收集的数据中的每个指标上拥有聚合数据（每15秒，1分钟等）。 What is the best approach to calculate the aggregation values? 什么是计算聚合值的最佳方法？ When to store in the DB? 什么时候存储在数据库中？ Should I calculate the values online and each 15 sec store in DB? 我应该在线计算值，并在数据库中每15秒存储一次？ Or maybe query the DB for the last results and calculate the aggregation on them? 还是可以查询数据库以获取最后的结果并计算它们的汇总值？ Should I use small aggregation (15 sec) to calculate the large aggregation (1 min) ? 我应该使用小聚合（15秒）来计算大聚合（1分钟）吗？ Are there free java tools for it? 有免费的Java工具吗？

2 个解决方案

From previous experience, I would suggest using a real time database, probably non-relational with a built-in ability to deal with time series. 根据以前的经验，我建议使用实时数据库，该数据库可能与处理时间序列的内置功能无关。 That way, you should be able to avoid storing calculated aggregated data. 这样，您应该能够避免存储计算出的聚合数据。 Using a relational database, you will quickly end up with millions of rows that will be difficult to manage and slow to access. 使用关系数据库，您将很快获得数百万行的记录，这些行将很难管理且访问缓慢。 Your other option is to denormalize your data and store every 1 hour of data in a single row, in a BLOB column (in binary format). 另一个选择是对数据进行非规范化，并将每1小时的数据存储在BLOB列（二进制格式）中的一行中。

You can use HSQLDB is MVCC mode for concurrent reads and writes. 您可以使用HSQLDB是MVCC模式进行并发读写。

Provided the table for the raw data has an indexed timestamp column, aggregate calculation on a range is very fast using a SELECT statement. 如果原始数据表具有索引的时间戳列，则使用SELECT语句对范围进行聚合计算非常快。 Because SELECT statements with aggregate calculations happen concurrently, you can use separate threads to perform the operation every 1 second and every 15 seconds. 因为带有聚合计算的SELECT语句是同时发生的，所以您可以使用单独的线程每1秒和每15秒执行一次操作。