如何加快在具有 1000 万条记录的表中运行的查询

Question

I am using the below query to retrieve records from a table in an SQL DB.我正在使用以下查询从 SQL 数据库中的表中检索记录。 This report aggregates the sum of values for a list of entries.此报告汇总条目列表的值总和。 The end result is a report listing around 2500 clients from around 7 mil records.最终结果是一份报告，列出了来自大约 700 万条记录的大约 2500 名客户。

select customer_id, sum(value) as value 
from `data` 
where ((`date` >= '2020-05-11' and `date` <= '2020-06-9')) 
group by `customer_id` 
order by `value` desc, `customer_id` asc;

This report takes about 60 seconds to generate if i pick the whole year as date range.如果我选择全年作为日期范围，则生成此报告大约需要 60 秒。 This report also has a customizable function which enables users to add up to 3 specific columns in the report.此报告还具有可自定义的 function，它允许用户在报告中添加多达 3 个特定列。 So apart from the simple listing users can choose to see from which media, product categories, and product sectors, the sales are coming from.所以除了简单的listing，用户可以选择查看销售来自哪些媒体、产品类别和产品部门。

I want to speed up the processing time it takes for the report to generate and thinking about creating an extra table that will simply hold grouped entries with the current aggregated value for each client along with the info for the 3 fields mentioned in the previous paragraph.我想加快生成报告所需的处理时间，并考虑创建一个额外的表，该表将简单地保存分组条目，其中包含每个客户的当前聚合值以及上一段中提到的 3 个字段的信息。 This basically means that my DB will shrink from about 7 mil records to around 2.5 mil records.这基本上意味着我的数据库将从大约 700 万条记录缩减到大约 250 万条记录。 In addition the sums will have already been calculated in the Table so that will save time as well.此外，表格中已经计算了总和，这样也可以节省时间。

(I assume) Do you think this extra table will make a difference? （我假设）你认为这张额外的桌子会有所作为吗？ Any other suggestions?还有其他建议吗？

Added after some comments.在一些评论后添加。 Interesting comment to say the least.至少可以说有趣的评论。 To make things even more mind challenging let me add some more detail.为了让事情更具挑战性，让我添加更多细节。 I am running the same DB content on two different servers.我在两台不同的服务器上运行相同的数据库内容。 The original DB contains only one master table with 7 mil records with no indexing at all.原始数据库仅包含一个包含 700 万条记录的主表，根本没有索引。 So all searches are text based.所以所有的搜索都是基于文本的。 Still most queries run at an acceptable speed.大多数查询仍然以可接受的速度运行。 The second DB feeds from the master Table in the original DB.第二个数据库来自原始数据库中的主表。 This DB is split into smaller tables, with the proper indexes and the queries on this DB take a bit longer than the queries in the original DB.该数据库被拆分为较小的表，具有适当的索引，并且该数据库上的查询比原始数据库中的查询花费的时间要长一些。 (Same queries of course). （当然同样的查询）。 Still however my main question is this.然而，我的主要问题仍然是这个。 If i create a new table that will list only aggregate sums by Client, along with the info for the 3 fields mentioned in my first paragraph, will that make a difference.如果我创建一个仅列出客户汇总金额的新表，以及第一段中提到的 3 个字段的信息，那会有所不同。 Let me illustrate this with an actual example.让我用一个实际的例子来说明这一点。 One client buys 15 different products which belong to two different product categories.一位客户购买了属于两个不同产品类别的 15 种不同产品。 My Sales Table will add 15 records to the original sales table for this transaction.我的销售表将为此事务的原始销售表添加 15 条记录。 My new Sales table will only list the sum of the purchase by category,so it will only add 2 records.On a big scale this means that i will be able to shrink a 7 mil records (and growing) to a 2 mil records table.我的新销售表只会按类别列出购买的总和，因此它只会添加 2 条记录。在大规模上，这意味着我将能够将 7 百万条记录（并且还在增长）缩小到 2 百万条记录表. So my question is.所以我的问题是。 Do you think this will speed up my queries?你认为这会加快我的查询速度吗？

Answer 1

You can speed up your query using a covering index :您可以使用覆盖索引加快查询速度：

create index ix1 on data (date, customer_id, value);

This index will improve the performance of the query, assuming it returns a limited number of rows;该索引将提高查询的性能，假设它返回的行数有限； a result set not beyond 0.5% of the rows.结果集不超过行的 0.5%。 However, your query seems to be processing a whole month of data.但是，您的查询似乎正在处理整整一个月的数据。 That's bound to be slow, no matter how you do it.不管你怎么做，这肯定会很慢。

Answer 2

At a minimum, make sure there is an index on [date].至少，确保在 [日期] 有一个索引。

Makes sure you are comparing the same data type - otherwise the index will likely not be used.确保您正在比较相同的数据类型 - 否则可能不会使用索引。 In your original code, you are comparing the [date] column to a string.在您的原始代码中，您将 [date] 列与字符串进行比较。

If [date] is a date, then you should to do:如果 [date] 是一个日期，那么你应该这样做：

([date] >= convert( date, '2020-05-11') and [date] <= convert(date,'2020-06-9'))

If [date] is a string, then you should fix your second date by using '2020-06-09' not '2020-06-9' because the original will return all the days in June.如果 [date] 是一个字符串，那么您应该使用'2020-06-09'而不是'2020-06-9'来修复您的第二个日期，因为原始日期将返回 6 月的所有日期。 Also make sure it really is always YYYY-MM-DD and not any other format.还要确保它真的总是 YYYY-MM-DD 而不是任何其他格式。

如何加快在具有 1000 万条记录的表中运行的查询

问题描述

2 个解决方案

解决方案1
1 2020-06-09 15:20:11

解决方案2
1 2020-06-09 15:47:28

如何加快在具有 1000 万条记录的表中运行的查询

问题描述

2 个解决方案

解决方案1 1 2020-06-09 15:20:11

解决方案2 1 2020-06-09 15:47:28

解决方案1
1 2020-06-09 15:20:11

解决方案2
1 2020-06-09 15:47:28