简体   繁体   中英

How to query a table with 300 columns and over 200 million records?

I have a table that has over 300 columns and have no indexes on it and yet, I need to use the other table's address hash value which is essentially a combination of first name, last name and address information to find information in the big table.

What should I do to speed this up? The big table doesn't have the address hash value column.

I am thinking using a "Search Engine", but that is something I am not familiar with. Is there any way I can still use SQL Server to append data to the small table which also has more than 14 million.

Use Indexing for your table fields for fetching data fast.

SQL - Indexes

Optimizing queries for 25+ million rows

Dynamically Query 100 Million Row Table-Efficiently

More ...

The speed of the query depends on the number of rows but if you do appropriate optimizations taking the performance factors such as:

  • Indexing Clustered/Non clustered
  • Data Caching
  • Table Partitioning
  • Execution Plan caching
  • Data Distribution

300 columns is a sure sign of bad or no normalization. Relational databases can only have performant queries if the data is properly normalized, at least third normal form (3NF), coupled with suitable indexes (clustered and non-clustered depending on the scenario).

Otherwise you will have to resort to hacks to get any decent performance out of it, especially if the table has 200 million records in it. A hack would be a new search table, which links the primary key in the big table with a hash value column based on several columns (in your case, first and last name and address). This search table would have a clustered index based on the hash column. A search would retrieve the primary key in the big table for a hash value, which you then use to get the record/row from the big table.

If the big table doesn't have a primary key on it, then it's even worse. You will need to add that hash value as a new column in the big table, and create a non-clustered index on that hash value column. That would be column number 301, one step further from a good table design in relational databases.

Perhaps you do not have the time now to normalize the table properly, but long term you will have to normalize the big table or you will keep running into problem after problem in the long run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM