简体   繁体   中英

How is the performance of the database affected if a query does a full table scan on a table of 5.5 million records

Could anyone help me with some advice related to database performance please?

Here's the scenario:

We use an SQL Server database at work and we store over 5.5 million profiles in a table (molecular biology stuff). The average growth rate is about 0.5 million profiles per year. The profiles themselves consist of 21 varchar fields with an average length of about 5 characters per field. They are character fields because they can store literal values as well: "mix/mix", "nr", "nd", "del/del". A team of medical scientists require me to implement a search utility that allows them to build queries dynamically to search for profiles with various permutations of these fields. The dilemma I'm facing now is how to minimise the impact on the performance of the database as I envisage such a query would do full table scans most times. I cannot predict what permutations of fields the scientists would use. They could use something like this:

WHERE field2 = "18/11"
  AND field7 = "12.7/15"
  AND field8 LIKE "%12%"
  AND field12 = "12/8.3"

or something like this:

WHERE field1 = "X/Y"
  AND field5 IN ("12/12","12/13","12/14","13/13","13/14","14/14")
  AND field10 = "12.7/15"
  AND field12 IN ("11/11","11/12","11/13","11/14","11/15","12/12")
  ...
  AND field21 = "9/11.8"

and many other possible combinations...

The query in its various permutations seems to consistently take about 1.5 minutes to execute. That in itself is acceptable to the scientists, but what worries me is how is this going to impact the performance of the database. Is it going to hog the CPU and will it become unresponsive to other medical staff while the utility is executing the query?

Any advice is greatly appreciated. Many thanks!

You probably want to implement indexes to speed the queries. Which indexes depend on checking all the queries that are likely to be generated.

For instance, the first index suggests an index on (field2, field7, field12) .

Indexes do incur additional costs for data modifications ( insert , update , delete ). However, you seem to have a low update volume, so this is probably not a big issue.

Another possibility is to transform the query conditions in such a way that you can use full-text search. This would require modifying the values in the queries to be more full-text indexable -- say, not starting values with numbers and replacing slashes with something else. But your queries would fit in well with such an index, if you really need performance.

Finally, investing in more memory might also be worthwhile. It sounds like your table would fit into a handful of gigabytes of memory, and full table scan of such a table in memory should be faster than what you now see.

  1. You have a lot of columns and clearly don't use most of them, this might indicate you are able to perform table normalization as there could be a lot of redundant data. This could allow you to restrict certain selects to only a portion of your normalized tables. Normalization can make performance worse if done incorrectly and it may not apply in your case if you don't have enough redundancy.

  2. No point in creating big composite indexes here, just look at the database statistics and decide which columns are used most and index them separately. The goal is to make the indexes reusable, composite indexes can't achieve this in your case. Since this is mostly a reference table (you are just inserting data if I gathered correctly) having multiple indexes won't cause problems.

  3. Depending on the usage habits of the users you need to decide if partitioning by a very commonly used column could be better than indexing. Partitioning is usually effective when the selects return more than 10% of your rows.

  4. Long running queries do not hog the db of course since it's multi-threaded but they slow down other queries because of the constant thread context switching required. A solution to this is to make sure your db is using all cores of the CPU (not familiar with SQL server in this regard).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM