简体   繁体   中英

Is it a bad idea to keep a subtotal field in database

I have a MySQL table that represents a list of orders and a related child table that represents the shipments associated with each order (some orders have more than one shipment, but most have just one).

Each shipment has a number of costs, for example:

  • ItemCost
  • ShippingCost
  • HandlingCost
  • TaxCost

There are many places in the application where I need to get consolidated information for the order such as:

  • TotalItemCost
  • TotalShippingCost
  • TotalHandlingCost
  • TotalTaxCost
  • TotalCost
  • TotalPaid
  • TotalProfit

All those fields are dependent on the aggregated values in the related shipment table. This information is used in other queries, reports, screens, etc., some of which have to return a result on tens of thousands of records quickly for a user.

As I see it, there are a few basic ways to go with this:

  1. Use a subquery to calculate these items from the shipment table whenever they are needed. This complicates things quite a bit for all the queried that needs all or part of this information. It is also slow.

  2. Create a view that exposes the subqueries as simple fields. This keeps the reports that needs them simple.

  3. Add these fields in the order table. These would give me the performance I am looking for, at the expense of having to duplicate data and calculate it when I make any changes to the shipment records.

One other thing, I am using a business layer that exposes functions to get this data (for example GetOrders(filter)) and I don't need the subtotals each time (or only some of them some of the time), so generating a subquery each time (even from a view) is probably a bad idea.

Are there any best practices that anybody can point me to help me decide what the best design for this is?

Incidentally, I ended up doing #3 primarily for performance and query simplicity reasons.

Update:

Got lots of great feedback pretty quickly, thank you all. To give a bit more background, one of the places the information is shown is on the admin console where I have a potentially very long list of orders and needs to show TotalCost, TotalPaid, and TotalProfit for each.

Theres absolutely nothing wrong with doing rollups of your statistical data and storing it to enhance application performance. Just keep in mind that you will probably need to create a set of triggers or jobs to keep the rollups in sync with your source data.

I would probably go about this by caching the subtotals in the database for fastest query performance if most of the time you're doing reads instead of writes. Create an update trigger to recalculate the subtotal when the row changes.

I would only use a view to calculate them on SELECT if the number of rows was typically pretty small and access somewhat infrequent. Performance will be much better if you cache them.

Option 3 is the fastest
If and when you are running into performance issues and if you cannot solve these any other way, option #3 is the way to go.

Use triggers to do the updating
You should use triggers after insert, update and delete to keep the subtotals in your order table in sync with the underlying data.
Take special care when retrospectively changing prices and stuff as this will require a full recalc of all subtotals. So you will need a lot of triggers, that usually don't do much most of the time.
if a taxrate changes, it will change in the future, for orders that you don't yet have

If the triggers take a lot of time, make sure you do these updates in off-peak hours.

Run an automatic check periodically to make sure the cached values are correct
You may also want to keep a golden subquery in place that calculates all the values and checkes them against the stored values in the order table.
Run this query every night and have it report any abnormalities, so that you can see when the denormalized values are out-of-sync.

Do not do any invoicing on orders that have not been processed by the validation query
Add an extra date field to table order called timeoflastsuccesfullvalidation and have it set to null if the validation was unsuccessful.
Only invoice items with a dateoflastsuccesfullvalidation less than 24 hours ago.
Of course you don't need to check orders that are fully processed, only orders that are pending.

Option 1 may be fast enough
With regards to #1

It is also slow.

That depends a lot on how you query the DB.
You mention subselects, in the below mostly complete skeleton query I don't see the need for many subselects, so you have me puzzled there a bit.

SELECT field1,field2,field3
       , oifield1,oifield2,oifield3
       , NettItemCost * (1+taxrate) as TotalItemCost
       , TotalShippingCost
       , TotalHandlingCost
       , NettItemCost * taxRate as TotalTaxCost
       , (NettItemCost * (1+taxrate)) + TotalShippingCost + TotalHandlingCost as TotalCost
       , TotalPaid
       , somethingorother as TotalProfit
FROM (

  SELECT o.field1,o.field2, o.field3
         , oi.field1 as oifield1, i.field2 as oifield2 ,oi.field3 as oifield3
         , SUM(c.productprice * oi.qty) as NettItemCost
         , SUM(IFNULL(sc.shippingperkg,0) * oi.qty * p.WeightInKg) as TotalShippingCost
         , SUM(IFNULL(hc.handlingperwhatever,0) * oi.qty) as TotalHandlingCost
         , t.taxrate as TaxRate
         , IFNULL(pay.amountpaid,0) as TotalPaid
  FROM orders o
  INNER JOIN orderitem oi ON (oi.order_id = o.id)
  INNER JOIN products p ON (p.id = oi.product_id)
  INNER JOIN prices c ON (c.product_id = p.id 
                       AND o.orderdate BETWEEN c.validfrom AND c.validuntil)
  INNER JOIN taxes t ON (p.tax_id = t.tax_id 
                       AND o.orderdate BETWEEN t.validfrom AND t.validuntil) 
  LEFT JOIN shippingcosts sc ON (o.country = sc.country
                       AND o.orderdate BETWEEN sc.validfrom AND sc.validuntil)
  LEFT JOIN handlingcost hc ON (hc.id = oi.handlingcost_id
                       AND o.orderdate BETWEEN hc.validfrom AND hc.validuntil)
  LEFT JOIN (SELECT SUM(pay.payment) as amountpaid FROM payment pay 
             WHERE pay.order_id = o.id) paid ON (1=1)
  WHERE o.id BETWEEN '1245' AND '1299'
  GROUP BY o.id DESC, oi.id DESC ) AS sub  

Thinking about it, you would need to split this query up for stuff that's relevant per order and per order_item but I'm lazy to do that now.

Speed tips
Make sure you have indexes on all fields involved in the join-criteria.
Use a MEMORY table for the smaller tables, like tax and shippingcost and use a hash index for the id 's in the memory-tables.

I would avoid #3 as possible as I can. I prefer that for different reasons:

  1. It's too hard to discuss performance without measurement. Imaging the user is shopping around, adding order items into an order; every time an item is added, you need to update the order record, which may not be necessary (some sites only show order total when you click shopping cart and ready to checkout).

  2. Having a duplicated column is asking for bugs - you cannot expect every future developer/maintainer to be aware of this extra column. Triggers can help but I think triggers should only be used as a last resort to address a bad database design.

  3. A different database schema can be used for reporting purpose. The reporting database can be highly de-normalized for performance purpose without complicating the main application.

  4. I tend to put the actual logic for computing subtotal at application layer because subtotal is actually an overloaded thing related to different contexts - sometimes you want the "raw subtotal", sometimes you want the subtotal after applying discount. You just cannot keep adding columns to the order table for different scenario.

It's not a bad idea, unfortunately MySQL doesn't have some features that would make this really easy - computed columns and indexed (materialized views). You can probably simulate it with a trigger.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM