简体   繁体   中英

Optimizing simple SQL query for large table

I have a query where one table has ~10 million rows and the other two are <20 in each table.

SELECT a.name, b.name, c.total
FROM smallTable1 a, smallTable2 b, largeTable c
WHERE c.id1 = a.id AND c.id2 = b.id;

largeTable has columns (id, id1, id2, total) and ~10 million rows

smallTable1 has columns (id, name)

smallTable2 has columns (id, name)

Right now it takes 5 seconds to run.
Is it possible to make it much faster?

Create indexes - they are the reason why querying is fast. Without indexes, we would be stuck with CPU-only solutions.

So:

  1. Create index for SmallTable1(id)
  2. Create index for SmallTable2(id)
  3. Create index for LargeTable(id1) and LargeTable(id2)

Important : You can create index for more than one column at the same time, like this LargeTable(id1,id2) <--- DO NOT DO THAT because it does not make sense in your case.

Next , your query is not out of the box wrong, but it does not follow the best practice querying. Relational databases are based on Set theory . Therefore, you must think in terms of "bags with marbles" instead of "cells in a table". Roughly, your initial query translates to:

  1. Get EVERYTHING from LargeTable c, SmallTable1 a and SmallTable2 b
  2. Now when you have all this information, find items where c.id1 = a.id AND c.id2 = b.id; (there goes your 5+ seconds because this is semi-resource intensive)

Ambrish has suggested the correct query, use that although this will not be faster.

Why? Because in the end, you still pull all the data from the table out of the database.

As for the data itself goes: 10 million records is not ridiculously large table, but it is not small either. In data warehouses, the star schema is a standard. And you have a star schema basically. The problem you are actually facing is that the result has to be calculated on-the-fly and that takes time. The reason i'm telling you this is because in corporate environments, engineers are facing this problems on a daily basis. And the solution is OLAP (basically pre-calculated, pre-aggregated, pre-summarized, pre-everything data). The end users then just query this precalculated data and the query seems very fast, but it is never 100% correct, because there is a delay between OLTP (on-line transactional processing = day to day database) and OLAP (on-line analytical processing = reporting database) The indexes will help with queries such as WHERE id = 3 etc. But when you are cross joining and basically pulling everything from DB, it probably wouldn't play a significant role in your case.

So to make long story short: if your only options are queries, it will be hard to make an improvement.

There is one circumstance under which separately indexing ID1 and ID2 in the large table will make less of a difference. If there are 9,000,000 rows with ID1 matching SmallTable1.id and 200 rows with ID2 matching SmallTable2.id , with the 200 being the only rows where both exist at the same time, you will still be doing almost a complete table/index scan. If that is the case, creating an index on both ID1 and ID2 should speed things up as it can then locate those 200 rows with index seeks.

If that works, you may want to include Total in that index to make it a covering index for that table.

This solution (assuming it is one) would be extremely data-centric and thus the execution would change if the data changes significantly.

Whatever you decide to do, I would suggest you make one change (create an index or whatever) then check the execution plan. Make another change and check the execution plan. Make another change and check the execution plan. Repeat or rewind as needed.

Use join instead of WHERE clause

SELECT a.name, b.name, c.total
FROM smallTable1 a join largeTable c on c.id1 = a.id
join smallTable2 b on c.id2 = b.id;

And create index on largeTable(id1) and largeTable(id2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM