简体   繁体   English

MySql选择查询。 4百万行的响应时间> 3分钟

[英]MySql Select Query. >3 minutes response for 4M rows

Agree there are a few mathematical calculations in the select query but surely not something which should affect the performance in such a way. 同意在选择查询中有一些数学计算,但是肯定不会以这种方式影响性能。

Below is the select query. 下面是选择查询。

SELECT `p`.`id` as post_id, `p`.`description` as description, `p`.`rent` as rent, 
`p`.`created_at` as created_at, `p`.`title` as title, 
UNIX_TIMESTAMP(p.created_at) as timestamp,
`p`.`user_id` as post_user_id, `p`.`bathrooms`, `p`.`bedrooms`, `p`.`created_at`, 
`p`.`address`, `p`.`lat`, `p`.`lng`, `p`.`posted_by`, `p`.`amenities`, `p`.`user_id`, 
`p`.`smoking_policy`, `p`.`sqft`, `p`.`dogs`, `p`.`cats`, `p`.`dwelling_type`,
`p`.`deposit`, 
`p`.`furnished`, `p`.`sublease`, `p`.`sublease_duration`, `p`.`lease`,          
`p`.`property_type`,`p`.`source`, `p`.`images_json`, `sub`.`name` as sub_category_name,   
`sub`.`id` as sub_category_id, `sub`.`text` as sub_category_text, `p`.`lat` as lat, 
`p`.`lng` as lng, `p`.`phone` as phone, 
(3959 * acos( cos( radians(42.3584308) ) * cos( radians( p.lat ) ) * cos( radians(  
 p.lng ) - radians(-71.0597732) ) + sin( radians(42.3584308) ) * sin( radians( p.lat ) 
) ) ) AS distance
FROM (`T1` p)
JOIN `sub_categories` as sub ON `sub`.`id` = `p`.`sub_category_id`
AND `p`.`lng` between (-71.0597732 - 20/abs(cos(radians(42.3584308 ))*69)) 
and (-71.0597732 + 20/abs(cos(radians(42.3584308))*69)) 
AND `p`.`lat` between 42.3584308 - (20/69) and 42.3584308 + (20/69)
AND `rent` <= '9200'
AND `rent` >= '7000'
AND `bedrooms` <= '4'
AND `bathrooms` <= '3'
AND `dogs` =  '1'
AND `p`.`sub_category_id` =  '2'
HAVING `distance` <= '100'
ORDER BY `p`.`created_at` desc
LIMIT 0,12;

The search should provide available listings within a periphery of input address (lat, long coordinates). 搜索应在输入地址(纬度,长坐标)的外围提供可用列表。

AND condition parameters (rent, bedrooms etc…) and associated values are dynamically assigned based on front end selection. 根据前端选择动态分配AND条件参数(租金,卧室等)和相关值。

Table structure is herewith. 表结构是这样。

CREATE TABLE `T1` (
`id` varchar(40) NOT NULL DEFAULT '',`user_id` varchar(100) NOT NULL DEFAULT '',
`sub_category_id` bigint(20) NOT NULL,
`description` text,`title` text,
`rent` int(11) DEFAULT NULL,
`utilities` int(11) DEFAULT NULL,
`bathrooms` float DEFAULT NULL,
`bedrooms` int(11) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`address` varchar(100) DEFAULT NULL,
`lat` double DEFAULT NULL,`lng` doubleDEFAULT NULL,
`dwelling_type` varchar(40) DEFAULT NULL,
`furnished` varchar(20) DEFAULT NULL,
`lease_transfer_fees` int(10) DEFAULT NULL,
`dogs` int(11) DEFAULT NULL,
`cats` int(11) DEFAULT NULL,
`parking_spots` int(10) DEFAULT NULL,
`smoking_policy` varchar(5) DEFAULT NULL,
`deposit` varchar(20) DEFAULT NULL,
`sqft` bigint(20) DEFAULT NULL,
`posted_by` varchar(20) DEFAULT NULL,
`amenities` varchar(500) DEFAULT NULL,
`sublease` varchar(20) DEFAULT NULL,
`sublease_duration` int(11) DEFAULT NULL,
`lease` varchar(20) DEFAULT NULL,
`external_id` varchar(40) DEFAULT NULL,
`source` varchar(10) DEFAULT 'np',
`anchor` varchar(40) DEFAULT NULL,
`property_type` varchar(40) DEFAULT NULL,
`deleted` tinyint(1) DEFAULT '0',
`images_json` text,
`phone` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id_index` (`user_id`),
KEY `filter_combined_index` (`created_at`,`lat`,`lng`,`sub_category_id`,`rent`,     
`bedrooms`,`bathrooms`,`sqft`,`dogs`,`cats`),
KEY `sub_category_id` (`sub_category_id`),
FULLTEXT KEY `text_search_index`    
(`title`,`description`,`smoking_policy`,`posted_by`,`dwelling_type`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

The explain statement result is below. 说明语句结果如下。

id  select_type table   type    possible_keys   key             key_len    ref     rows      Extra
1   SIMPLE      sub     const   PRIMARY,id      PRIMARY         8          const    1      Using filesort
1   SIMPLE      p       ref     sub_category_id sub_category_id 8          const    188122  Using where

Is the table structure not efficient or the select query or mixture of both? 表结构效率不高,还是选择查询或两者混合?

Surely 4m rows should not be a limiting factor. 当然,4m行不应成为限制因素。 Thanks in advance for advice of resident experts. 在此先感谢居民专家的建议。

TA! TA!

You basically calculate the distance for every row that exists on the table: 您基本上可以计算表上每一行的距离:

SELECT [...]
(3959 * acos( cos( radians(42.3584308) ) * cos( radians( p.lat ) ) *
cos( radians(    p.lng ) - radians(-71.0597732) ) + sin(
radians(42.3584308) ) * sin( radians( p.lat )  ) ) ) AS distance
[...]
HAVING `distance` <= '100'

That forces MySQL to read the complete table on every query. 这迫使MySQL在每个查询中读取完整的表。

Additionally, the only index that includes coordinates is not usable for searches because it starts with created_at : 此外,唯一包含坐标的索引不可用于搜索,因为它以created_at开头:

KEY `filter_combined_index` (`created_at`,`lat`,`lng`,`sub_category_id`,`rent`,
`bedrooms`,`bathrooms`,`sqft`,`dogs`,`cats`),

You can try a simple per-coordinate search, with an appropriate index: 您可以尝试使用适当的索引进行简单的每坐标搜索:

WHERE lat BETWEEN :lat_from AND :lat_to
and lng BETWEEN :lng_from AND :lng_to

... where those from and two values belong to the bounding square. ...其中from和两个值属于边界平方。 Once you've identified only the potential matches, you can fine-tune the results with an actual circle. 一旦确定了潜在的匹配项,就可以用实际的圆微调结果。

You are asking the database to do a lot of work here, but I'd also suggest restructuring your query slightly? 您要求数据库在这里做很多工作,但我还建议您稍微重新组织一下查询?

Firstly, you're using a join, but you're not explicitly using a WHERE clause, so you're actually specifying a very big JOIN condition. 首先,您正在使用JOIN ,但未显式使用WHERE子句,因此实际上是在指定非常大的JOIN条件。 Internally there's a good chance MySql will automatically figure out that it's actually just a WHERE clause, but given that none of the rows in that join seem to have any bearing on the join itself, they should probably be in their own WHERE . 在内部,MySql很可能会自动发现它实际上只是WHERE子句,但是鉴于该联接中的所有行似乎都与联接本身无关,因此它们可能应该位于自己的WHERE This might make a big difference as it will theoretically reduce the number of rows before doing the join. 这可能会有很大的不同,因为从理论上讲,它将在进行连接之前减少行数。

Secondly, you're using a HAVING clause, yet you don't have any aggregation in the query. 其次,您正在使用HAVING子句,但查询中没有任何聚合。 General rule of thumb is that the HAVING clause is used for aggregation (eg COUNT or AVG), and the WHERE clause is used everywhere else. 一般经验法则是HAVING子句用于聚合(例如COUNT或AVG),而WHERE子句在其他任何地方都使用。

As @Joachim and @LHristov have both touched on, doing those calculations at query time might not be such a great idea. 正如@Joachim和@LHristov都提到过的那样,在查询时进行这些计算可能不是一个好主意。 You're already asking for a lot of data, but you're now asking it do run the calculation for every row it finds, and then run a separate calculation for the join. 您已经需要很多数据,但是现在您要让它对找到的每一行进行计算,然后对联接运行单独的计算。 Unfortunately you've said you can't persist this so that can't be solved, but @Álvaro's suggesting might improve things if the following changes don't 不幸的是,您曾说过您不能坚持这一点,因此无法解决,但是@Álvaro的建议可能会改善以下情况:

Restructure the query to use WHERE instead of the JOIN and remove the having. 重组查询以使用WHERE而不是JOIN并删除该查询。 I'd expect a resulting query to look like the following 我希望结果查询如下所示

SELECT 
  `p`.`id` as post_id, 
, `p`.`description` as description, 
, `p`.`rent` as rent
, `p`.`created_at` as created_at
, `p`.`title` as title
, UNIX_TIMESTAMP(p.created_at) as timestamp
, `p`.`user_id` as post_user_id
, `p`.`bathrooms`
, `p`.`bedrooms`
, `p`.`created_at`
, `p`.`address`
, `p`.`lat`
, `p`.`lng`
, `p`.`posted_by`
, `p`.`amenities`
, `p`.`user_id`,
, `p`.`smoking_policy`
, `p`.`sqft`
, `p`.`dogs`
, `p`.`cats`
, `p`.`dwelling_type`
, `p`.`deposit`
, `p`.`furnished`
, `p`.`sublease`
, `p`.`sublease_duration`
, `p`.`lease`
, `p`.`property_type`
, `p`.`source`
, `p`.`images_json`
, `sub`.`name` as sub_category_name
, `sub`.`id` as sub_category_id
, `sub`.`text` as sub_category_text
, `p`.`lat` as lat
, `p`.`lng` as lng
, `p`.`phone` as phone,
, (3959 * acos( cos( radians(42.3584308) ) * cos( radians( p.lat ) ) * cos( radians(  

p.lng ) - radians(-71.0597732) ) + sin( radians(42.3584308) ) * sin( radians( p.lat ) ) ) ) AS distance FROM ( T1 p) JOIN sub_categories as sub ON sub . p.lng)-弧度(-71.0597732))+ sin(弧度(42.3584308))* sin(弧度(p.lat)))))AS距离FROM( T1 p)JOIN子sub_categories为sub ON sub id = p . id = p sub_category_id AND p . sub_category_idp sub_category_id = '2' WHERE ( p . lng BETWEEN (-71.0597732 - 20/abs(cos(radians(42.3584308 ))*69)) AND (-71.0597732 + 20/abs(cos(radians(42.3584308))*69)) ) AND ( p . lat BETWEEN (42.3584308 - (20/69)) and (42.3584308 + (20/69)) ) AND rent <= 9200 AND rent >= 7000 AND bedrooms <= 4 AND bathrooms <= 3 AND dogs = 1 AND distance <= 100 ORDER BY p . sub_category_id = '2' WHERE( plng BETWEEN(-71.0597732 - 20 / ABS(COS(弧度(42.3584308))* 69))AND(-71.0597732 + 20 / ABS(COS(弧度(42.3584308))* 69)) )AND( plat BETWEEN(42.3584308 - (六十九分之二十○))和(42.3584308 +(六十九分之二十○)))AND rent <= 9200 AND rent > = 7000和bedrooms <= 4和bathrooms <= 3, dogs = 1 AND distance <= 100 ORDER BY p created_at desc LIMIT 0,12; created_at desc LIMIT 0,12;

Hopefully as mentioned before, this will cause the number of rows that the calculation is performed on to be reduced considerably before doing any calculation and JOIN , whereas just having the JOIN may be causing it to return all rows with calculation before checking if they join. 希望如前所述,这将导致在执行任何计算和JOIN之前将要执行计算的行数大大减少,而仅具有JOIN可能会导致它在检查是否合并之前返回所有带有计算的行。 MUCH slower as you can imagine 可以想象的要慢得多

I also just noticed that you're selecting a few of the columns multiple times such as created_at and user_id ? 我还注意到您多次选择了一些列,例如created_atuser_id Not sure if intentional or not but can make a minor difference. 不知道是否有意,但可能会有所不同。

Also, the where clause conditions for fields such as bedrooms , rent , dogs etc that are integers, are all being compared as if they're strings? 另外,是否比较整数(例如, bedroomsrentdogs等)的where子句条件,就好像它们是字符串一样? I've changed that in the query above. 我在上面的查询中更改了它。

This is a comment, taking advantage of the answer window formatting options. 这是一条注释,利用了答案窗口格式选项。

FWIW, I find this easier to read... and I'd bind distance as a function... FWIW,我发现这更容易阅读...并且我将距离绑定为一个函数...

SELECT p.id post_id
     , p.description 
     , p.rent 
     , p.title
     , p.user_id post_user_id
     , p.bathrooms
     , p.bedrooms
     , p.created_at
     , p.address
     , p.posted_by
     , p.amenities
     , p.smoking_policy
     , p.sqft
     , p.dogs
     , p.cats
     , p.dwelling_type
     , p.deposit
     , p.furnished
     , p.sublease
     , p.sublease_duration
     , p.lease
     , p.property_type
     , p.source
     , p.images_json
     , sub.name sub_category_name
     , sub.id sub_category_id
     , sub.text sub_category_text
     , p.lat 
     , p.lng 
     , p.phone 
     , my_distance_function(p.lat,p.lng,71.0597732,42.3584308) distance
  FROM T1 p
  JOIN sub_categories sub
    ON sub.id = p.sub_category_id     
 WHERE my_distance_function(p.lat,p.lng,71.0597732,42.3584308) <= 100
   AND p.lng BETWEEN -71.452028 AND -70.6675175
   AND p.lat BETWEEN 42.0685757 AND 42.6482859
   AND rent <= 9200     
   AND rent >= 7000
   AND bedrooms <= 4
   AND bathrooms <= 3
   AND dogs =  1
   AND p.sub_category_id = 2

 ORDER 
    BY p.created_at DESC
 LIMIT 0,12;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM