Large amount of timecourses in database

Question

I have a rather large amount of data (~400 mio datapoints) which is organized in a set of ~100,000 timecourses. This data may change every day and for reasons of revision-safety has to be archived daily.

Obviously we are talking about way too much data to be handled efficiently, so I made some analysis on sample data. Approx. 60 to 80% of the courses do not change at all between two days and for the rest only a very limited amount of the elements changes. All in all I expect much less than 10 mio datapoints change.

The question is, how do I make use of this knowledge? I am aware of concepts like the Delta-Trees used by SVN and similar techniques, however I would prefer, if the database itself would be capable of handling such semantic compression. We are using Oracle 11g for storage and the question is, is there a better way than a homebrew solution?

Clarification

I am talking about timecourses representing hourly energy-currents. Such a timecourse might start in the past (like 2005), contains 8760 elements per year and might end any time up to 2020 (currently). Each timecourse is identified by one unique string.

The courses themselves are more or less boring: "Course_XXX: 1.1.2005 0:00 5; 1.1.2005 1:00 5;1.1.2005 2:00 7,5;..."

My task is making day-to-day changes in these courses visible and to do so, each day at a given time a snapshot has to be taken. My hope is, that some loss-free semantical compression will spare me from archiving ~20GB per day.

Answer 1

Basically my source data looks like this:

Key | Value0 | ... | Value23

to archive that data I need to add an additional dimension which directly or indirectly tells me the time at which the data was loaded from the source-system, so my archive-database is

Key | LoadID | Value0 | ... | Value23

Where LoadID is more or less the time the source-DB was accessed.

Now, compression in my scenario is easy. LoadIDs are growing with each run and I can give a range, ie

Key | LoadID1 | LoadID2 | Value0 | ... | Value23

Where LoadID1 gives me the ID of the first load where the 24 values where observed and LoadID2 gives me the ID of the last consecutive load where the 24 values where observed.

In my scenario, this reduces the amount of data stored in the database to 1/30th

Large amount of timecourses in database

Question

1 answers

solution1
0 ACCPTED 2013-05-09 19:33:15

Large amount of timecourses in database

Question

1 answers

solution1 0 ACCPTED 2013-05-09 19:33:15

solution1
0 ACCPTED 2013-05-09 19:33:15