Often I deal with aggregate or parent entities which have attributes derived from their constituent or children members. For example:
The byte_count
and packet_count
of a TcpConnection
object is computed from the same attributes of its two constituent TcpStream
objects, which in turn are computed from their constituent TcpPacket
objects.
An Invoices
object might have a total
which is basically the SUM() of its constituent InvoiceLineItems
' prices, with a little freight, discount and tax logic thrown in.
When dealing with millions of packets or millions of invoiced line items (I wish!), on-demand computation of these derived attributes -- either in a VIEW or more commonly in presentation logic like reports or web interfaces -- is often unacceptably slow.
How do you decide, before performance concerns force your hand, whether to "promote" derived attributes to precomputed fields?
I personally wouldn't denormalize until performance trade-offs force my hand (because the downside of denormalizations are too drastic IMHO), but you might also consider:
Ref: The Database Programmer: The Argument for Denormalization . Be sure to read as well his article on Keeping Denormalized Values Correct - his recommendation is to use triggers. That brings home the kind of trade-off denormalization requires.
Basically, you don't. You left performance concerns force your hand.
That's the best answer because 99% of the time, you should not be pre-optimizing like this, it's better to just calc it on the fly.
However, it is quite common for client-application developers to come to the server-side with mistaken preconceptions like " on-demand computation of ...derived attributes... -- is often unacceptably slow ", and this just IS NOT true. The correct wording here would be " is rarely unacceptably slow ".
As such, unless you are an expert in this (a DB Development Architect, etc.), you should not be engaging in premature optimization. Wait until it's obvious that is has to be fixed, then look at pre-aggregation.
How current the data must be determines how you implement it, really.
I'll assume 2 simple states: current or not current.
That said, I would develop against the same quantity of data as I have in prod so I have some confidence in response time. You should rarely be surprised by your code performance...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.