简体   繁体   中英

how can i make the following query more efficient in lookup with many conditions

First of all i need to have a functionality of pagination by limiting the result in the following query lookup with couple where conditions.

SELECT SQL_CALC_FOUND_ROWS
    a.uid, b.NAME
FROM
    `profiles` AS a FORCE INDEX(profiles_country_city_gender_index)
JOIN `users` AS b
ON b.id = a.uid
AND a.country = 'INDONESIA'
AND a.gender = 0
JOIN (
    SELECT
        a.uid
    FROM
       profile_details AS a
    JOIN profile_details AS kids ON kids.uid = a.uid
    AND kids.kids_pref = 1
    JOIN profile_details AS current ON current.uid = a.uid
    AND current.current_relationship = 1
    JOIN profile_details AS smoking ON smoking.uid = a.uid
    AND smoking.smoking_pref = 1
    ) AS e ON e.uid = a.uid
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
LIMIT 33;

All tables here is one on one relation with table Users which are

  • Profiles
  • Profile_details

Using id column as Primary key in Users, and uid in the other tables as Foreign key. At the start, i do not have problem with the above query/design until the tables grows till 300K rows in it, the query running takes OK, Time: 0.726000s to fetch the result which is too slow for my opinion.

I tried to count the rows based on the conditions above using count(*) and get some approximate same result, i need to have the faster way to get the count of rows from the lookup conditions to make the pagination system works as expected with less waiting time.

As you can see in the query, i am using:

FORCE INDEX(profiles_country_city_gender_index)

I think it's not helping much caused by the larger rows result from the scope by using:

AND a.country = 'INDONESIA' 
AND a.gender = 0

Resulting (148801 rows scope limit by country with gender is equal by 0), if i am pairing with city it's not problem query time is considerable because rows result much smaller but still gonna be a problem when there were larger rows at someday.

To anyone who might ask for the query explanation:

Explain SELECT SQL_CALC_FOUND_ROWS
        a.uid, 
        b.NAME ...

Results:

| select_type | table   | type   | possible_keys                      | key                                | key_len | ref              | rows   | filtered  | Extra                              |
+-------------+---------+--------+------------------------------------+------------------------------------+---------+------------------+--------+-----------+------------------------------------+
| SIMPLE      | a       | ref    | profiles_country_city_gender_index | profiles_country_city_gender_index | 242     | const            | 148801 | 10.00     | Using index condition; Using where |
| SIMPLE      | a       | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 100.00.00 | Using index                        |
| SIMPLE      | kids    | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 10.00     | Using where                        |
| SIMPLE      | current | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 10.00     | Using where                        |
| SIMPLE      | smoking | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 10.00     | Using where                        |
| SIMPLE      | b       | eq_ref | PRIMARY                            | PRIMARY                            | 3       | restfulapi.a.uid | 1      | 100.00.00 |                                    |

As you can see in explain result, no table scan or using temporary or using range, only index condition. I imagine if the tables have at least 1 million rows returns by Country scope, just multiply the time with scaling rows by 300K it sucks:(.

Below is the table definition in case it helps to analysing the problem:

CREATE TABLE `profile_details` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `uid` mediumint(8) unsigned NOT NULL,
  `intents` tinyint(4) NOT NULL DEFAULT '3',
  `height` smallint(6) DEFAULT NULL,
  `body_type` tinyint(4) NOT NULL DEFAULT '5',
  `kids_pref` tinyint(4) NOT NULL DEFAULT '1',
  `drinking_pref` tinyint(4) NOT NULL DEFAULT '2',
  `living_with` tinyint(4) NOT NULL DEFAULT '0',
  `current_relationship` tinyint(4) NOT NULL DEFAULT '1',
  `sexual_pref` tinyint(4) NOT NULL DEFAULT '1',
  `smoking_pref` tinyint(4) NOT NULL DEFAULT '0',
  `status_online` tinyint(4) NOT NULL DEFAULT '0',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `profile_details_uid_foreign` (`uid`),
  KEY `idx_multipart` (`intents`,`body_type`,`kids_pref`,`drinking_pref`,`living_with`,`current_relationship`,`sexual_pref`,`smoking_pref`),
  CONSTRAINT `profile_details_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)

CREATE TABLE `profiles` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `uid` mediumint(8) unsigned NOT NULL,
  `birth_date` date NOT NULL,
  `gender` tinyint(4) NOT NULL DEFAULT '0',
  `country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
  `city` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT 'Makassar',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `latitude` double NOT NULL DEFAULT '0',
  `longitude` double NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `profiles_uid_foreign` (`uid`),
  KEY `profiles_birth_date_index` (`birth_date`),
  KEY `profiles_latitude_longitude_index` (`latitude`,`longitude`),
  KEY `profiles_country_city_gender_index` (`country`,`city`,`gender`),
  KEY `idx_country_gender_birthdate` (`country`,`gender`,`birth_date`),
  KEY `idx_country_city_gender_birthdate` (`country`,`city`,`gender`,`birth_date`),
  CONSTRAINT `profiles_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)

How can I find the solution, do i need to redesign the table to get the ideal system? perhaps it is the last option.

EDIT

I am trying what you suggested earlier, first i added an index in three column:

CREATE INDEX profiles_country_gender_birth_date_index on `profiles`(country,gender,birth_date);

and i tried to select Count(*) without JOIN with profile_detail:

SELECT
    count(*)


FROM
    `profiles` AS a 
    FORCE INDEX ( profiles_country_gender_birth_date_index )
    JOIN `users` AS b ON b.id = a.uid 
and 
a.country = 'INDONESIA' 

    AND a.gender =1 
    AND a.birth_date BETWEEN NOW()- INTERVAL 35 YEAR 
    AND NOW()- INTERVAL 25 YEAR 

result timing is not stable between 0.7sec to 0.35sec and i don't know why it could be. Below is Explain Query Plan in Json Format just in case to help to find out the culprit.

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "114747.38"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "a",
          "access_type": "range",
          "possible_keys": [
            "profiles_country_gender_birth_date_index"
          ],
          "key": "profiles_country_gender_birth_date_index",
          "used_key_parts": [
            "country",
            "gender",
            "birth_date"
          ],
          "key_length": "246",
          "rows_examined_per_scan": 94066,
          "rows_produced_per_join": 32961,
          "filtered": "100.00",
          "index_condition": "((`restfulapi`.`a`.`gender` = 1) and (`restfulapi`.`a`.`country` = 'INDONESIA') and (`restfulapi`.`a`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))))",
          "cost_info": {
            "read_cost": "15858.00",
            "eval_cost": "6592.23",
            "prefix_cost": "75194.00",
            "data_read_per_join": "16M"
          },
          "used_columns": [
            "uid",
            "birth_date",
            "gender",
            "country"
          ]
        }
      },
      {
        "table": {
          "table_name": "b",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "id"
          ],
          "key_length": "3",
          "ref": [
            "restfulapi.a.uid"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 32961,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "32961.15",
            "eval_cost": "6592.23",
            "prefix_cost": "114747.38",
            "data_read_per_join": "89M"
          },
          "used_columns": [
            "id"
          ]
        }
      }
    ]
  }
}
INDEX(country, gender, birth_date)  -- in this order

and change the use of birth_date to be "sargeable":

AND  ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )

to

AND a.birth_date BETWEEN NOW() - INTERVAL 35 YEAR
                     AND NOW() - INTERVAL 25 YEAR

So that the optimizer can use birth_date .

LIMIT 33 -- Do you care which 33 rows? Maybe you need an ORDER BY ?

Don't do JOIN ( SELECT... profile_details... ) when a plan JOIN profile_details... would work.

SQL_CALC_FOUND_ROWS costs something. Remove it to see how fast it goes, then decide whether it is worth keeping.

I don't think you need to JOIN profile_details more than once, especially since it is 1:1 with profiles .

Here's what I mean:

Instead of JOIN ( SELECT... ) have just

JOIN  profile_details AS d  USING(uid)

Then add these to the WHERE clause:

AND  d.kids_pref = 1
AND  d.current_relationship = 1
AND  d.smoking_pref = 1 

Avoiding filesort

INDEX(country, gender,   -- Tested with '='
      birth_date,        -- Tested as a "range"
      uid)               -- For the ORDER BY -- Useless!

When building an index, include columns in this order

  1. All columns tested as "column = constant".
  2. One Range (such as BETWEEN ). If this is the same as the ORDER BY then "filesort" is probably avoided.

If there were no "range" in the WHERE , then

  1. All columns tested as "column = constant".
  2. The ORDER BY columns -- assuming they are all DESC or all ASC (or, in MySQL 8.0, match the INDEX definition). This will probably avoid "filesort".

But an index cannot handle both a "range" and a different "order by". Consider the following. You have a list of people with last name and first name. And the query is

SELECT ...
    WHERE last_name LIKE 'Ja%'   -- a "range"
    ORDER BY first_name;

INDEX(last_name, first_name) will help with the WHERE , but have the first_names jumbled. And vice versa.

(That is a simplification, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql for some more details.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM