简体   繁体   中英

Time series data storage

I'm collecting a large number of UDP packets (time dependant) coming from a service on the same network. These packets are being deserialised into structures that contain numbers (float and int) in memory and processed. We could say we are collecting time series data. However, it's not the kind of time series data that you get from supervising a service (mostly the same value for a period of time). These values constantly vary, true, not by very much. But they vary, nevertheless.

Besides this, I would like to send that data to a server in the cloud and on that server store the time series data.

My question is: what possibilities are there to compress the data in order to send smaller packets over the wire to the server (we could send the incoming UDP packets in batches over TCP) and store them? I'm particularly interested in not using the whole storage I have attached to the server. The data for one session is close to 32MB and i would have multiple sessions at the same time. One session's data is not related to another session. They are totally independent.

You can compress time series data using this library: https://github.com/dgryski/go-tsz

It's based on this paper from Facebook: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

We have found that about 96% of all time stamps can be compressed to a single bit.

[...]

Roughly 51% of all values are compressed to a single bit since the current and previous values are identical. About 30% of the values are compressed with the control bits '10' (case b), with an average compressed size of 26.6 bits. The remaining 19% are compressed with control bits '11', with an average size of 36.9 bits, due to the extra 13 bits of overhead required to encode the length of leading zero bits and meaningful bits.

You can use a key-value store like bolt (or probably better: rocksdb which supports compression) and store multiple points for each key. For example you could store one key-value pair every 10 minutes, where the value would be all of the points that occurred during that 10 minute window.

This should give you both good performance and high compression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM