简体繁体中英

How should I auto-expire entires in an ETS table, while also limiting its total size?

原文 2015-05-30 18:10:18 3 3 erlang/ ets

I have a lot of analytics data which I'm looking to aggregate every so often (let's say one minute.) The data is being sent to a process which stores it in an ETS table, and every so often a timer sends it a message to process the table and remove old data.

The problem is that the amount of data that comes in varies wildly, and I basically need to do two things to it:

If the amount of data coming in is too big, drop the oldest data and push the new data in. This could be viewed as a fixed size queue, where if the amount of data hits the limit, the queue would start dropping things from the front as new data comes to the back.
If the queue isn't full, but the data has been sitting there for a while, automatically discard it (after a fixed timeout.)

If these two conditions are kept, I could basically assume the table has a constant size, and everything in it is newer than X.

The problem is that I haven't found an efficient way to do these two things together. I know I could use match specs to delete all entires older than X, which should be pretty fast if the index is the timestamp. Though I'm not sure if this is the best way to periodically trim the table.

The second problem is keeping the total table size under a certain limit, which I'm not really sure how to do. One solution comes to mind is to use an auto-increment field wich each insert, and when the table is being trimmed, look at the first and the last index, calculate the difference and again, use match specs to delete everything below the threshold.

Having said all this, it feels that I might be using the ETS table for something it wasn't designed to do. Is there a better way to store data like this, or am I approaching the problem correctly?

3 answers

You can determine the amount of data occupied using ets:info(Tab, memory) . The result is in number of words. But there is a catch. If you are storing binaries only heap binaries are included. So if you are storing mostly normal Erlang terms you can use it and with a timestamp as you described, it is a way to go. For size in bytes just multiply by erlang:system_info(wordsize) .

I haven't used ETS for anything like this, but in other NoSQL DBs (DynamoDB) an easy solution is to use multiple tables: If you're keeping 24 hours of data, then keep 24 tables, one for each hour of the day. When you want to drop data, drop one whole table.

I would do the following: Create a server responsible for

receiving all the data storage messages. This messages should be time stamped by the client process (so it doesn't matter if it waits a little in the message queue). The server will then store then in the ETS, configured as ordered_set and using the timestamp, converted in an integer, as key (if the timestamps are delivered by the function erlang:now in one single VM they will be different, if you are using several nodes, then you will need to add some information such as the node name to guarantee uniqueness).
receiving a tick (using for example timer:send_interval) and then processes the message received in the last N µsec (using the Key = current time - N) and looking for ets:next(Table,Key), and continue to the last message. Finally you can discard all the messages via ets:delete_all_objects(Table). If you had to add an information such as a node name, it is still possible to use the next function (for example the keys are {TimeStamp:int(),Node:atom()} you can compare to {Time:int(),0} since a number is smaller than any atom)

How to identify the exact memory size of an ETS table?

How to filter ETS table without ets:select

How to cleanup an ETS table after use?

What is the average size ratio between a data file and ETS table?

How to reduce the process 's memory usage when I convert a ets table to list in Erlang?

How to update a number inside a tuple stored in an ets table?

How to use an if-structure for finding out if ets table is empty

How to do an ets table lookup using a secondary key

How can one update an ETS table via mysql triggers

Erlang ETS table events

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to identify the exact memory size of an ETS table? How to filter ETS table without ets:select How to cleanup an ETS table after use? What is the average size ratio between a data file and ETS table? How to reduce the process 's memory usage when I convert a ets table to list in Erlang? How to update a number inside a tuple stored in an ets table? How to use an if-structure for finding out if ets table is empty How to do an ets table lookup using a secondary key How can one update an ETS table via mysql triggers Erlang ETS table events

Related Tags

How should I auto-expire entires in an ETS table, while also limiting its total size?

Question

3 answers

solution1
2 2015-05-31 12:22:26

solution2
1 2015-05-31 11:43:32

solution3
0 2015-05-31 21:18:13

How should I auto-expire entires in an ETS table, while also limiting its total size?

Question

3 answers

solution1 2 2015-05-31 12:22:26

solution2 1 2015-05-31 11:43:32

solution3 0 2015-05-31 21:18:13

solution1
2 2015-05-31 12:22:26

solution2
1 2015-05-31 11:43:32

solution3
0 2015-05-31 21:18:13