简体繁体中英

Suggestions on how to store and retrieve time-series data

原文 2017-09-07 08:52:31 8 1 database/ time-series/ influxdb

I am currently working on a project that requires us to store a large amount of time series data, but more importantly, retrieve large amounts of it quick.

There will be N devices (>10,000) which will periodically send data to the system, lets say every 5 seconds. This data will quickly build up, but we are generally only interested in the most recent data, and want to compact the older data. We don't want to remove it, as it is still useful, but instead of having thousands of data point for a day, we might save just 5 or 10 after N days/weeks/months have passed.

Specifically we want to be able to fetch sampled data over a large time period, say a year or two. There might be millions of points here, but we just want a small, linearly distributed, sample of this data.

Today we are experimenting with influxdb, which initially seemed like an alright solution. It was fast enough and allows us to store our data in a reasonable structure, but we have found that it is not completely satisfactory. We were unable to perform the sample query described above and in general the system does not feel mature enough for us.

Any advice on how we can proceed, or alternative solutions, is much appreciated.

1 answers

You might be interested in looking at TimescaleDB:

https://github.com/timescale/timescaledb

It builds a time-series DB on top of Postgres and so offers full SQL support, as well as generally the Postgres ecosystem/reliability. This can give you a lot greater query flexibility, which sounds like you want.

In terms of your specific use case, there would really be two solutions.

First, what people typically would do is to create two "hypertables", one for raw data, another for sampled data. These hypertables look like standard tables to the user, although heavily partitioned under the covers for much better scalability (eg, 20x insert throughput vs. postgres for large table sizes).

Then you basically do a roll-up from the raw to the sampled table, and use a different data retention policy on each (so you keep raw data for say 1 month, with sampled data for years).

http://docs.timescale.com/getting-started/setup/starting-from-scratch http://docs.timescale.com/api/data-retention

Second, you can go with a single hypertable, and then just schedule a normal SQL query to delete individual rows from data that's older than a certain time period.

We might even in the future add better first-class support for this latter approach if it becomes a common-enough requested feature, although most use cases we've encountered to date seemed more focused on #1, esp. in order to to keep statistical data about removed data-points, as opposed to just straight samples.

(Disclaimer: I'm one of the authors of TimescaleDB.)

Best way to store high frequency, periodic time-series data?

How can I efficiently store rapidly changing time-series data in mongodb?

Grouping time-series data by time intervals

Database Implementation Help : Time-Series data

Storing time-series data, relational or non?

Database solution for static time-series data

When to save time-series data

How to store many years worth of 100 x 25 Hz time-series - Sql Server or timeseries database

Which database should I use for this kind of time-series data?

storing time-series data in a database or binary file

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Best way to store high frequency, periodic time-series data? How can I efficiently store rapidly changing time-series data in mongodb? Grouping time-series data by time intervals Database Implementation Help : Time-Series data Storing time-series data, relational or non? Database solution for static time-series data When to save time-series data How to store many years worth of 100 x 25 Hz time-series - Sql Server or timeseries database Which database should I use for this kind of time-series data? storing time-series data in a database or binary file

Related Tags

Suggestions on how to store and retrieve time-series data

Question

1 answers

solution1 2 2017-09-09 16:35:30

solution1
2 2017-09-09 16:35:30