简体   繁体   中英

Query taking too much time on a large database table with 3 million entries,how to optimize the performance

Spring Boot Query

@Query(value="SELECT  * 
              FROM products p 
                join product_generic_name pg on pg.id = p.product_generic_name_id 
              where (p.product_name like %?1% 
                    and p.parent_product_id IS NULL 
                    and p.is_active=true and 
                    (p.is_laboratory is null or p.is_laboratory = false)
                    ) 
              or (pg.product_generic_name like %?1% 
                    and pg.is_active = true) ",nativeQuery = true)

Page<Products> findByProductNameLikeAndGenericNameLike(String searchText, Pageable pageable);

The product table has over 3 million entries and query takes around 4 min to complete.How to optimize the query performance. I tried indexing product_name column but not much performance improvement.

This is a very open end question I would say.

I will try to break it for you.

There are a couple of things that you can unless you already haven't.

Tip 1: Optimize Queries In many cases database performance issues are caused by inefficient SQL queries. Optimizing your SQL queries is one of the best ways to increase database performance. When you try to do that manually, you'll encounter several dilemmas around choosing how best to improve query efficiency. These include understanding whether to write a join or a subquery, whether to use EXISTS or IN, and more. When you know the best path forward, you can write queries that improve efficiency and thus database performance as a whole. That means fewer bottlenecks and fewer unhappy end users.

The best way to optimize queries is to use a database performance analysis solution that can guide your optimization efforts by directing you to the most inefficient queries and offering expert advice on how best to improve them.

Tip 2: Improve Indexes In addition to queries, the other essential element of the database is the index. When done right, indexing can increase your database performance and help optimize the duration of your query execution. Indexing creates a data structure that helps keep all your data organized and makes it easier to locate information. Because it's easier to find data, indexing increases the efficiency of data retrieval and speeds up the entire process, saving both you and the system time and effort.

Tip 3: Defragment Data Data defragmentation is one of the best approaches to increasing database performance. Over time, with so much data constantly being written to and deleted from your database, your data can become fragmented. That fragmentation can slow down the data retrieval process as it interferes with a query's ability to quickly locate the information it's looking for. When you defragment data, you allow for relevant data to be grouped together and you erase index page issues. That means your I/O related operations will run faster.

Tip 4: Increase Memory The efficiency of your database can suffer significantly when you don't have enough memory available for the database to work correctly. Even if it seems like you have a lot of memory in total, you might not be meeting the demands of your database. A good way to figure out if you need more memory is to check how many page faults your system has. When the number of faults is high, it means your hosts are either running low on or completely out of available memory. Increasing your memory allocation will help boost efficiency and overall performance.

Tip 5: Strengthen CPU A better CPU translates directly into a more efficient database. That's why you should consider upgrading to a higher-class CPU unit if you're experiencing issues with your database performance. The more powerful your CPU is, the less strain it'll have when dealing with multiple requests and applications. When assessing your CPU, you should keep track of all the elements of CPU performance, including CPU ready times, which tell you about the times your system tried to use the CPU, but couldn't because the resources were otherwise occupied.

Adding an index to product_name won't help as you are doing a like search on it, not an exact match. For your query, you should add indexes to:

  • is_active
  • is_laboratory
  • parent_product_id

However doing a "free text" search with two wildcards, at the start and end of of your search is not a great use case for a relational database. Is this the best design for this problem? If you have 3 million products, could you have a "product_group" which the user has to select to reduce the number of rows to be searched? Or alternatively this is a use case which is a good fit for a full text search engine like ElasticSearch or Solr.

There are two bottlenecks:

  • like %?1% -- The leading wildcard means that it must read and check every row.

  • OR -- This is rarely optimizable.

If like %?1% is only looking at "words", then using a FULLTEXT index and MATCH will run much faster.

OR can be turned into a UNION . It should probably be UNION DISTINCT , assuming that ?1 could be in both the name and the generic_name .

More memory, more regular indexes, etc, etc -- These are not likely to will help. EXPLAIN and other analysis tools tell you what is going on now , not how to improve the query and/or indexes. Defragmentation (in InnoDB) is mostly a waste of time. There is only a narrow range of CPU speeds; this has not changed in over 20 years. Extra cores are useless since MySQL will use only one core for this query. A mere 3M rows means that you probably have more than enough RAM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM