简体   繁体   中英

Relational Databases: Current Data vs. Historical Data, best Practice

Let's take a relational database, eg MySQL. To keep it simple, i will concentrate on the important things: Having a table that contains orders, with fields like order_id (primary key) order_date and a foreign key fk_supplier, which references a primary key in a table supplier . This table also has a field called supplier_name . Now, lets imagine, there is a php-website that is showing all orders that were made in a table. Each row of the table consists of the order_id , order_date and the supplier_name (the sql-statement made a join over the two tables). Everything is okay so far. Now, someone changes the name of one supplier that is referenced in one of the orders: the historical data becomes untrue or false. My question is: What is the best practice, to prevent this? Three solutions come to my mind:

  1. Don't let the user change a supplier datarow, that is referenced in orders. Make him add a new supplier if the name changes.
  2. Always save the current supplier data (eg the supplier name) with the order record and don't use primary key / foreign key references.
  3. Introduce time-slices: every time, an important attribut of a supplier (like the name) is changed, create a new time slice. Reference not only the supplier_id in orders, but in addition the corresponding time-slice.

All this approaches have advantages and disadvantages. Point 2 eg, seems pretty dirty and against all rules of a relational database. Point 3 would usually be the way to go, in my opinion. But needs a lot of effort, programming wise. The user experience / usability gets pretty bad, too.

I would like to hear, how experienced developers and database designers deal with that problem.

A form of Option 3 where you have a StartDate and EndDate in regards to the Supplier information. This way the data is accurate throughout all time (supplier name is correct as it was at the given times). One thing you could also do is create a spreadsheet the contents of which get loaded into the database every night that has the Supplier information (into a fact_Supplier table or lookup table). All edits to Suppliers go through this spreadsheet with access given only to those select people who would be in charge of that kind of thing. If there is a change in the spreadsheet the prior information in the Supplier table is end dated and a new record inserted with the new information. Any change will get this to happen, Supplier Name, Supplier address, etc.

Will expand on my comment:

I would choose Option 2 but with a few modifications:

  • Supplier table should stay as is.
  • Order table should keep referencing Supplier table
  • Create new table eg OrderInvoiceDetails which should have a 1 to 1 relationship to Order table and an FK to supplier. This table will contain a snapshot of supplier details.

Pros:

  • Easy and efficient querying
  • Invoice details can be modified separately from the Supplier table if need be.
  • I would pose that logically this is the best solution as you want to store supplier details in relation to a specific order as opposed to storing Supplier History data.
  • Old Data can be easily archived together with Orders data

Cons:

  • Stores redundant data, especially for suppliers whose details do not change often

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM