简体繁体 English

预先计算的SQL属性指南

[英]guidance on precomputed SQL attributes

原文 2010-02-05 21:33:10 0 3 sql/ precompute

Often I deal with aggregate or parent entities which have attributes derived from their constituent or children members. 我经常会处理具有从其组成或子成员派生的属性的聚合或父实体。 For example: 例如：

The byte_count and packet_count of a TcpConnection object is computed from the same attributes of its two constituent TcpStream objects, which in turn are computed from their constituent TcpPacket objects. 的byte_count和packet_count一个的TcpConnection对象从它的两个组成的相同的属性计算TcpStream对象，而这又是从它们的组成计算TcpPacket对象。
An Invoices object might have a total which is basically the SUM() of its constituent InvoiceLineItems ' prices, with a little freight, discount and tax logic thrown in. 一个Invoices对象的total可能基本上是其组成InvoiceLineItems价格的SUM（），并InvoiceLineItems了一些运费，折扣和税收逻辑。

When dealing with millions of packets or millions of invoiced line items (I wish!), on-demand computation of these derived attributes -- either in a VIEW or more commonly in presentation logic like reports or web interfaces -- is often unacceptably slow. 当处理数百万个数据包或数百万个已开票的订单项时（我希望！），按需计算这些派生属性的速度（无论是在VIEW中还是在报表或Web界面等表示逻辑中更常见）通常会令人无法接受。

How do you decide, before performance concerns force your hand, whether to "promote" derived attributes to precomputed fields? 在性能问题迫使您动手之前，您如何决定是否将衍生属性“提升”到预先计算的字段？

3 个解决方案

I personally wouldn't denormalize until performance trade-offs force my hand (because the downside of denormalizations are too drastic IMHO), but you might also consider: 在性能折衷迫使我动手之前，我个人不会取消规范化（因为规范化的缺点太严重了，恕我直言），但是您可能还会考虑：

Convenience : eg if two different client apps want to calculate the same derived attributes, they both have to code up the queries to calculate them. 便利性 ：例如，如果两个不同的客户端应用程序要计算相同的派生属性，则它们都必须对查询进行编码以计算它们。 Denormalization offers both client apps the derived attribute in a simpler way. 非规范化以一种更简单的方式为两个客户端应用程序提供了派生属性。
Stability over time : eg if the formula for calculating a derived attribute is changeable, denormalization allows you to capture and store the derived value at a point in time so future calculations will never get it wrong 随时间推移的稳定性 ：例如，如果用于计算派生属性的公式是可变的，则非规范化可以让您在某个时间点捕获和存储派生值，因此以后的计算将永远不会出错
Simpler queries : adding complexity to the DB structure can mean your Select query is simpler at the client end. 更简单的查询 ：增加数据库结构的复杂性可能意味着您的Select查询在客户端更简单。
Performance : Select queries on denormalized data can be quicker. 性能：选择对非规范化数据的查询可以更快。

Ref: The Database Programmer: The Argument for Denormalization . 参考：数据库程序员：关于非规范化的争论。 Be sure to read as well his article on Keeping Denormalized Values Correct - his recommendation is to use triggers. 一定要阅读他的文章， 保持正确的非规范化值正确 -他的建议是使用触发器。 That brings home the kind of trade-off denormalization requires. 这就带来了需要权衡的非规范化。

Basically, you don't. 基本上，您不需要。 You left performance concerns force your hand. 您对性能的担心会迫使您动手。

That's the best answer because 99% of the time, you should not be pre-optimizing like this, it's better to just calc it on the fly. 这是最好的答案，因为99％的时间，你不应该预先优化这样的，最好是刚calc下它的飞行。

However, it is quite common for client-application developers to come to the server-side with mistaken preconceptions like " on-demand computation of ...derived attributes... -- is often unacceptably slow ", and this just IS NOT true. 但是，客户端应用程序开发人员带着错误的先入之见来到服务器端是很普遍的，例如“ 按需计算...派生属性...- 常常慢得令人无法接受 ”，这是不正确的。。 The correct wording here would be " is rarely unacceptably slow ". 此处正确的措词是“ 很少会令人无法接受地缓慢 ”。

As such, unless you are an expert in this (a DB Development Architect, etc.), you should not be engaging in premature optimization. 因此，除非您是此方面的专家（DB开发架构师等），否则您不应该从事过早的优化。 Wait until it's obvious that is has to be fixed, then look at pre-aggregation. 等到很明显，就是已经被固定，再看看前聚集。

How current the data must be determines how you implement it, really. 数据必须是最新的，这实际上决定了如何实现它。

I'll assume 2 simple states: current or not current. 我将假设2个简单状态：当前或不当前。

Current: indexed views, triggers, stored procs to maintain aggregate tables etc 当前：索引视图，触发器，存储的proc以维护聚合表等
Not current: Reporting Service snapshots, log shipping/replication, data warehouse etc 当前不是：Reporting Service快照，日志传送/复制，数据仓库等

That said, I would develop against the same quantity of data as I have in prod so I have some confidence in response time. 就是说，我将使用与生产相同数量的数据进行开发，因此我对响应时间充满信心。 You should rarely be surprised by your code performance... 您应该很少对代码性能感到惊讶...