简体繁体 English

SQL 2008 R2独立服务器应在单个表中存储的最大建议行数是多少？

[英]What is the maximum recommended number of rows that a SQL 2008 R2 standalone server should store in a single table?

原文 2010-12-20 18:21:19 6 3 sql/ sql-server/ database/ performance/ large-data-volumes

I'm designing my DB for functionality and performance for realtime AJAX web applications, and I don't currently have the resources to add DB server redundancy or load-balancing. 我正在为实时AJAX Web应用程序设计我的数据库功能和性能，我目前没有资源来添加数据库服务器冗余或负载平衡。

Unfortunately, I have a table in my DB that could potentially end up storing hundreds of millions of rows, and will need to read and write quickly to prevent lagging the web-interface. 不幸的是，我的数据库中有一个表可能最终存储数亿行，并且需要快速读写以防止滞后于Web界面。

Most, if not all, of the columns in this table are individually indexed, and I'd love to know if there are other ways to ease the burden on the server when running querys on large tables. 此表中的大多数（如果不是全部）列都是单独编制索引的，我很想知道在大型表上运行查询时是否还有其他方法可以减轻服务器的负担。 But is there eventually a cap for the size (in rows or GB) of a table before a single unclustered SQL server starts to choke? 但是，在单个非集群SQL服务器开始阻塞之前，最终是否存在表的大小（以行或 GB为单位）的上限？

My DB only has a dozen tables, with maybe a couple dozen foriegn key relationships. 我的数据库只有十几个表，可能有十几个关键关系。 None of my tables have more than 8 or so columns, and only one or two of these tables will end up storing a large number of rows. 我的表中没有一个列有超过8个列，并且这些表中只有一个或两个最终会存储大量行。 Hopefully the simplicity of my DB will make up for the massive amounts of data in these couple tables ... 希望我的数据库的简单性能够弥补这些表格中的大量数据......

3 个解决方案

The only limit is the size of your primary key. 唯一的限制是主键的大小。 Is it an INT or a BIGINT? 它是INT还是BIGINT？

SQL will happily store the data without a problem. SQL将很乐意存储数据而不会出现问题。 However, with 100 millions of rows, your best off partitioning the data. 但是，拥有1亿行，您最好对数据进行分区。 There are many good articles on this such as this article . 关于这方面有很多好文章，例如本文。

With partitions, you can have 1 thread per partition working at the same time to parallelise the query even more than is possible without paritioning. 使用分区，每个分区可以同时运行1个线程，以便在不进行分区的情况下进行并行查询。

Rows are limited strictly by the amount of disk space you have available. 行严格受限于您可用的磁盘空间量。 We have SQL Servers with hundreds of millions of rows of data in them. 我们有SQL Server，其中包含数亿行数据。 Of course, those servers are rather large. 当然，那些服务器相当大。

In order to keep the web interface snappy you will need to think about how you access that data. 为了保持Web界面的流畅，您需要考虑如何访问该数据。

One example is to stay away from any type of aggregate queries which require processing large swaths of data. 一个例子是远离任何需要处理大量数据的聚合查询。 Things like SUM() can be a killer depending on how much data it's trying to process. 像SUM（）这样的东西可能是一个杀手，取决于它试图处理多少数据。 In these situations you are much better off calculating any summary or grouped data ahead of time and letting your site query these analytic tables. 在这些情况下，您最好提前计算任何摘要或分组数据，并让您的站点查询这些分析表。

Next you'll need to partition the data. 接下来，您需要对数据进行分区。 Split those partitions across different drive arrays. 跨不同驱动器阵列拆分这些分区。 When SQL needs to go to disk it makes it easier to parallelize the reads. 当SQL需要转到磁盘时，它可以更容易地并行化读取。 (@Simon touched on this). （@Simon谈到这个）。

Basically, the problem boils down to how much data you need to access at any one time. 基本上，问题归结为您一次需要访问多少数据。 This is the main problem regardless of the amount of data you have on disk. 无论您在磁盘上拥有多少数据，这都是主要问题。 Even small databases can be choked if the drives are slow and the amount of available RAM in the DB server isn't enough to keep enough of the DB in memory. 如果驱动器速度很慢并且数据库服务器中的可用RAM量不足以在内存中保留足够的DB，则即使是小型数据库也会被阻塞。

Usually for systems like this large amounts of data are basically inert, meaning that it's rarely accessed. 通常对于像这样的系统，大量数据基本上是惰性的，这意味着它很少被访问。 For example, a PO system might maintain a history of all invoices ever created, but they really only deal with any active ones. 例如，PO系统可能会保留所有已创建发票的历史记录，但它们实际上只处理任何活动发票。

If your system has similar requirements, then you might have a table that is for active records and simply archive them to another table as part of a nightly process. 如果您的系统具有类似的要求，那么您可能拥有一个用于活动记录的表，并将其作为夜间过程的一部分归档到另一个表中。 You could even have statistics like monthly averages (as an example) recomputed as part of that archival. 您甚至可以将月平均值（作为示例）的统计数据重新计算为该档案的一部分。

Just some thoughts. 只是一些想法。

My gut tells me that you will probably be okay, but you'll have to deal with performance. 我的直觉告诉我你可能会没事，但你必须处理表现。 It's going to depend on the acceptable time-to-retrieve results from queries. 它将取决于从查询中检索结果的可接受时间。

For your table with the "hundreds of millions of rows", what percentage of the data is accessed regularly? 对于具有“数亿行”的表，定期访问的数据百分比是多少？ Is some of the data, rarely accessed? 是一些数据，很少访问？ Do some users access selected data and other users select different data? 有些用户访问所选数据而其他用户选择不同的数据吗？ You may benefit from data partitioning. 您可能会受益于数据分区。