简体   繁体   English

趋势1亿+行

[英]Trending 100 million+ rows

I have a system which records some measured values every second. 我有一个系统,每秒记录一些测量值。 What is the best way to store trend data which are values corresponding to a specific second? 存储趋势数据的最佳方法是什么,这些数据是与特定秒对应的值?

1 day = 86.400 seconds
1 month = 2.592.000 seconds

Around 1000 values to keep track of every seconds. 大约1000个值来跟踪每秒。

Currently there are 50 tables grouping the trend data for 20 columns each. 目前,有50个表格对每个20列的趋势数据进行分组。 These tables contain more than 100 million rows. 这些表包含超过1亿行。

    TREND_TIME datetime (clustered_index)
    TREND_DATA1 real
    TREND_DATA2 real
    ...
    TREND_DATA20 real

Have you considered RRDTool - it provides a round robin database, or circular buffer, for time series data. 您是否考虑过RRDTool - 它为时间序列数据提供循环数据库或循环缓冲区。 You can store data at whatever interval you like, then define consolidation points and a consolidation function, for example (sum, min, max, avg) for a given period, 1 second, 5 seconds, 2 days, etc. Because it knows what consolidation points you want, it doesn't need to store all the data points once they've been agregated. 您可以按照您喜欢的任何间隔存储数据,然后定义合并点和合并功能,例如(sum,min,max,avg)给定时间段,1秒,5秒,2天等等。因为它知道什么您想要的合并点,一旦它们被聚集,就不需要存储所有数据点。

Ganglia and Cacti use this under the covers and it's quite easy to use from many languages. GangliaCacti在封面下使用它,并且很容易使用多种语言。

If you do need all the datapoints, consider using it just for the aggregation. 如果确实需要所有数据点,请考虑仅将其用于聚合。

I would change the data saving approach and instead of saving 'raw' data as values I would save 5-20 minutes of data in an array (Memory, BL side), compress that array using LZ based algorithm and then store the data in the database as binary data. 我会更改数据保存方法,而不是将'原始'数据保存为值,我将在数组中保存5-20分钟的数据(内存,BL端),使用基于LZ的算法压缩该数组,然后将数据存储在数据库为二进制数据。 Also, it would be nice to save Max/Min/Avg/etc.. info for that binary chunk. 此外,保存该二进制块的Max / Min / Avg / etc ..信息会很好。

When you want to process the data you can process the data chunk after chunk and by that you keep a low memory profile for your application. 当您想要处理数据时,您可以在块之后处理数据块,并为此保留应用程序的低内存配置文件。 this approach is a little more complex but very scalable in terms of memory/processing. 这种方法稍微复杂一些,但在内存/处理方面具有很高的可扩展性。

hope this helps. 希望这可以帮助。

Is the problem the database schema? 问题是数据库架构吗?

1 second to many trends obviously first shows you a separate table with a seconds-table foreign key. 1秒到多个趋势显然首先会显示一个带有秒表外键的单独表。 Alternatively, if the "many trend values" is represented by the columns and not rows you can always append the columns to the seconds table and incur null values. 或者,如果“许多趋势值”由列而不是行表示,则始终可以将列附加到秒表并产生空值。

Have you tried that? 你试过吗? Was performance poor? 表现差吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM