简体   繁体   中英

Database Design: how to track history?

What are the general strategies in DB design to maintain a revision history? If it were just one table I was dealing with, I think it wouldn't be so hard. Just save each update as a new record in the table. The last record will always be the latest revision.

But when the data is stored across multiple tables, what's a good way to design that so that it can track revisions?

I prefer to have additional historical table for each versioned table. Same structure as main table with time_from and time_to additional fields. Transparently filled with triggers. time_to of latest revision set to far far future.

State for specified moment can be retrieved with query like this:

SELECT * FROM user_history 
WHERE time_from >= '2012-02-01' AND time_to <= '2012-02-01' 

As for me, storing history within main table is not generally a good idea, as it requires complicated conditions when retrieving or joining current data.

The hard part is not the versioning of the "base" tables - you just version them individually as you would a single table in isolation.

The hard part is tracking connections between them.

How exactly are you going to do that depends on the requirements of the particular project. Here is an example of how sales orders could be "historized" , but there are many other variations possible.

打开MySQL的二进制日志记录,然后使用它。

I am using approach, where each object that I'm dealing with has at least 1 so called instance table, where I keep the data that tends to change over time. Typically such tables follow the following concept:

  • they have _HISTORY suffix in the name;
  • they have 2 extra fields, start_dt and end_dt , indicating object instance's lifetime;
  • start_dt is NOT NULL , end_dt can be NULL , which indicates that instance is current and is not limited in it's time;
  • it is possible to insert future-dated changes, say you want a new company name to be activated from 1/Jan-2013 , then you need to set end_dt of the current instance to 31/Dec-2012 23:59:59 and insert a new record with start_dt of 1/Jan-2013 00:00:00 ;
  • sometimes I also add revision field, if it is necessary to track revisions.

In order to have a proper RI constraints with such design, I always have 2 tables for versioned obejcts. Say, for Customer obejct I have the following set of tables:

customer (customer_id INTEGER, PRIMARY KEY (customer_id));
customer_history (customer_id INTEGER, start_dt TIMESTAMP, end_dt TIMESTAMP,
                  name VARCHAR(50), sex CHAR(1), ...,
                  PRIMARY KEY (customer_id, start_dt));
customer_bank_history (customer_id INTEGER, start_dt TIMESTAMP, end_dt TIMESTAMP,
                       bank_id INTEGER, iban VARCHAR(34));

In all other places I use customer(customer_id) to build foreign keys. Querying actual customer details is simple:

SELECT c.customer_id, ch.name, ch.sex
  FROM customer c
  JOIN customer_history ch ON c.customer_id = ch.customer_id
       AND now() BETWEEN ch.start_dt AND coalesce(end_dt, now());

Why I prefer such design:

  1. I have versioned object instances on the database level by design;
  2. I have to maintain less tables;
  3. It is not possible to get history lost in case somebody drops/disables any triggers;
  4. I can plan and maintain future-dated changes easily.

Hope this will help you.

Datadiff . API powered DB revision tracking.

Full disclosure:

I built Datadiff. I needed a solution that provided a visual history of a data model in MongoDB for help supporting a SASS product. It will work with SQL databases too.

You can use do basic querying with key:val notation. ie id:123

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM