How to speed up a query which runs in a table with 10 million records

Question

I am using the below query to retrieve records from a table in an SQL DB. This report aggregates the sum of values for a list of entries. The end result is a report listing around 2500 clients from around 7 mil records.

select customer_id, sum(value) as value 
from `data` 
where ((`date` >= '2020-05-11' and `date` <= '2020-06-9')) 
group by `customer_id` 
order by `value` desc, `customer_id` asc;

This report takes about 60 seconds to generate if i pick the whole year as date range. This report also has a customizable function which enables users to add up to 3 specific columns in the report. So apart from the simple listing users can choose to see from which media, product categories, and product sectors, the sales are coming from.

I want to speed up the processing time it takes for the report to generate and thinking about creating an extra table that will simply hold grouped entries with the current aggregated value for each client along with the info for the 3 fields mentioned in the previous paragraph. This basically means that my DB will shrink from about 7 mil records to around 2.5 mil records. In addition the sums will have already been calculated in the Table so that will save time as well.

(I assume) Do you think this extra table will make a difference? Any other suggestions?

Added after some comments. Interesting comment to say the least. To make things even more mind challenging let me add some more detail. I am running the same DB content on two different servers. The original DB contains only one master table with 7 mil records with no indexing at all. So all searches are text based. Still most queries run at an acceptable speed. The second DB feeds from the master Table in the original DB. This DB is split into smaller tables, with the proper indexes and the queries on this DB take a bit longer than the queries in the original DB. (Same queries of course). Still however my main question is this. If i create a new table that will list only aggregate sums by Client, along with the info for the 3 fields mentioned in my first paragraph, will that make a difference. Let me illustrate this with an actual example. One client buys 15 different products which belong to two different product categories. My Sales Table will add 15 records to the original sales table for this transaction. My new Sales table will only list the sum of the purchase by category,so it will only add 2 records.On a big scale this means that i will be able to shrink a 7 mil records (and growing) to a 2 mil records table. So my question is. Do you think this will speed up my queries?

Answer 1

You can speed up your query using a covering index :

create index ix1 on data (date, customer_id, value);

This index will improve the performance of the query, assuming it returns a limited number of rows; a result set not beyond 0.5% of the rows. However, your query seems to be processing a whole month of data. That's bound to be slow, no matter how you do it.

Answer 2

At a minimum, make sure there is an index on [date].

Makes sure you are comparing the same data type - otherwise the index will likely not be used. In your original code, you are comparing the [date] column to a string.

If [date] is a date, then you should to do:

([date] >= convert( date, '2020-05-11') and [date] <= convert(date,'2020-06-9'))

If [date] is a string, then you should fix your second date by using '2020-06-09' not '2020-06-9' because the original will return all the days in June. Also make sure it really is always YYYY-MM-DD and not any other format.

How to speed up a query which runs in a table with 10 million records

Question

2 answers

solution1
1 2020-06-09 15:20:11

solution2
1 2020-06-09 15:47:28

How to speed up a query which runs in a table with 10 million records

Question

2 answers

solution1 1 2020-06-09 15:20:11

solution2 1 2020-06-09 15:47:28

solution1
1 2020-06-09 15:20:11

solution2
1 2020-06-09 15:47:28