First of all i need to have a functionality of pagination by limiting the result in the following query lookup with couple where conditions.
SELECT SQL_CALC_FOUND_ROWS
a.uid, b.NAME
FROM
`profiles` AS a FORCE INDEX(profiles_country_city_gender_index)
JOIN `users` AS b
ON b.id = a.uid
AND a.country = 'INDONESIA'
AND a.gender = 0
JOIN (
SELECT
a.uid
FROM
profile_details AS a
JOIN profile_details AS kids ON kids.uid = a.uid
AND kids.kids_pref = 1
JOIN profile_details AS current ON current.uid = a.uid
AND current.current_relationship = 1
JOIN profile_details AS smoking ON smoking.uid = a.uid
AND smoking.smoking_pref = 1
) AS e ON e.uid = a.uid
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
LIMIT 33;
All tables here is one on one relation with table Users which are
Using id column as Primary key in Users, and uid in the other tables as Foreign key. At the start, i do not have problem with the above query/design until the tables grows till 300K rows in it, the query running takes OK, Time: 0.726000s
to fetch the result which is too slow for my opinion.
I tried to count the rows based on the conditions above using count(*) and get some approximate same result, i need to have the faster way to get the count of rows from the lookup conditions to make the pagination system works as expected with less waiting time.
As you can see in the query, i am using:
FORCE INDEX(profiles_country_city_gender_index)
I think it's not helping much caused by the larger rows result from the scope by using:
AND a.country = 'INDONESIA'
AND a.gender = 0
Resulting (148801 rows scope limit by country with gender is equal by 0), if i am pairing with city it's not problem query time is considerable because rows result much smaller but still gonna be a problem when there were larger rows at someday.
To anyone who might ask for the query explanation:
Explain SELECT SQL_CALC_FOUND_ROWS
a.uid,
b.NAME ...
Results:
| select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-------------+---------+--------+------------------------------------+------------------------------------+---------+------------------+--------+-----------+------------------------------------+
| SIMPLE | a | ref | profiles_country_city_gender_index | profiles_country_city_gender_index | 242 | const | 148801 | 10.00 | Using index condition; Using where |
| SIMPLE | a | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 100.00.00 | Using index |
| SIMPLE | kids | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | current | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | smoking | ref | profile_details_uid_foreign | profile_details_uid_foreign | 3 | restfulapi.a.uid | 1 | 10.00 | Using where |
| SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 3 | restfulapi.a.uid | 1 | 100.00.00 | |
As you can see in explain result, no table scan or using temporary or using range, only index condition. I imagine if the tables have at least 1 million rows returns by Country scope, just multiply the time with scaling rows by 300K it sucks:(.
Below is the table definition in case it helps to analysing the problem:
CREATE TABLE `profile_details` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned NOT NULL,
`intents` tinyint(4) NOT NULL DEFAULT '3',
`height` smallint(6) DEFAULT NULL,
`body_type` tinyint(4) NOT NULL DEFAULT '5',
`kids_pref` tinyint(4) NOT NULL DEFAULT '1',
`drinking_pref` tinyint(4) NOT NULL DEFAULT '2',
`living_with` tinyint(4) NOT NULL DEFAULT '0',
`current_relationship` tinyint(4) NOT NULL DEFAULT '1',
`sexual_pref` tinyint(4) NOT NULL DEFAULT '1',
`smoking_pref` tinyint(4) NOT NULL DEFAULT '0',
`status_online` tinyint(4) NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `profile_details_uid_foreign` (`uid`),
KEY `idx_multipart` (`intents`,`body_type`,`kids_pref`,`drinking_pref`,`living_with`,`current_relationship`,`sexual_pref`,`smoking_pref`),
CONSTRAINT `profile_details_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)
CREATE TABLE `profiles` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`uid` mediumint(8) unsigned NOT NULL,
`birth_date` date NOT NULL,
`gender` tinyint(4) NOT NULL DEFAULT '0',
`country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
`city` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT 'Makassar',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`latitude` double NOT NULL DEFAULT '0',
`longitude` double NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `profiles_uid_foreign` (`uid`),
KEY `profiles_birth_date_index` (`birth_date`),
KEY `profiles_latitude_longitude_index` (`latitude`,`longitude`),
KEY `profiles_country_city_gender_index` (`country`,`city`,`gender`),
KEY `idx_country_gender_birthdate` (`country`,`gender`,`birth_date`),
KEY `idx_country_city_gender_birthdate` (`country`,`city`,`gender`,`birth_date`),
CONSTRAINT `profiles_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)
How can I find the solution, do i need to redesign the table to get the ideal system? perhaps it is the last option.
EDIT
I am trying what you suggested earlier, first i added an index in three column:
CREATE INDEX profiles_country_gender_birth_date_index on `profiles`(country,gender,birth_date);
and i tried to select Count(*) without JOIN with profile_detail:
SELECT
count(*)
FROM
`profiles` AS a
FORCE INDEX ( profiles_country_gender_birth_date_index )
JOIN `users` AS b ON b.id = a.uid
and
a.country = 'INDONESIA'
AND a.gender =1
AND a.birth_date BETWEEN NOW()- INTERVAL 35 YEAR
AND NOW()- INTERVAL 25 YEAR
result timing is not stable between 0.7sec to 0.35sec and i don't know why it could be. Below is Explain Query Plan in Json Format just in case to help to find out the culprit.
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "114747.38"
},
"nested_loop": [
{
"table": {
"table_name": "a",
"access_type": "range",
"possible_keys": [
"profiles_country_gender_birth_date_index"
],
"key": "profiles_country_gender_birth_date_index",
"used_key_parts": [
"country",
"gender",
"birth_date"
],
"key_length": "246",
"rows_examined_per_scan": 94066,
"rows_produced_per_join": 32961,
"filtered": "100.00",
"index_condition": "((`restfulapi`.`a`.`gender` = 1) and (`restfulapi`.`a`.`country` = 'INDONESIA') and (`restfulapi`.`a`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))))",
"cost_info": {
"read_cost": "15858.00",
"eval_cost": "6592.23",
"prefix_cost": "75194.00",
"data_read_per_join": "16M"
},
"used_columns": [
"uid",
"birth_date",
"gender",
"country"
]
}
},
{
"table": {
"table_name": "b",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "3",
"ref": [
"restfulapi.a.uid"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 32961,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "32961.15",
"eval_cost": "6592.23",
"prefix_cost": "114747.38",
"data_read_per_join": "89M"
},
"used_columns": [
"id"
]
}
}
]
}
}
INDEX(country, gender, birth_date) -- in this order
and change the use of birth_date
to be "sargeable":
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
to
AND a.birth_date BETWEEN NOW() - INTERVAL 35 YEAR
AND NOW() - INTERVAL 25 YEAR
So that the optimizer can use birth_date
.
LIMIT 33
-- Do you care which 33 rows? Maybe you need an ORDER BY
?
Don't do JOIN ( SELECT... profile_details... )
when a plan JOIN profile_details...
would work.
SQL_CALC_FOUND_ROWS
costs something. Remove it to see how fast it goes, then decide whether it is worth keeping.
I don't think you need to JOIN profile_details
more than once, especially since it is 1:1 with profiles
.
Here's what I mean:
Instead of JOIN ( SELECT... )
have just
JOIN profile_details AS d USING(uid)
Then add these to the WHERE clause:
AND d.kids_pref = 1
AND d.current_relationship = 1
AND d.smoking_pref = 1
Avoiding filesort
INDEX(country, gender, -- Tested with '='
birth_date, -- Tested as a "range"
uid) -- For the ORDER BY -- Useless!
When building an index, include columns in this order
BETWEEN
). If this is the same as the ORDER BY
then "filesort" is probably avoided. If there were no "range" in the WHERE
, then
ORDER BY
columns -- assuming they are all DESC
or all ASC
(or, in MySQL 8.0, match the INDEX
definition). This will probably avoid "filesort".But an index cannot handle both a "range" and a different "order by". Consider the following. You have a list of people with last name and first name. And the query is
SELECT ...
WHERE last_name LIKE 'Ja%' -- a "range"
ORDER BY first_name;
INDEX(last_name, first_name)
will help with the WHERE
, but have the first_names jumbled. And vice versa.
(That is a simplification, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql for some more details.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.