简体繁体 English

MySQL 按 PM 父 id 分组时查询慢（vBulletin 数据库）

[英]MySQL query slow when grouping by PM parent id (vBulletin database)

原文 2020-06-14 08:50:02 2 2 c#/ mysql/ asp.net-core/ mariadb/ vbulletin

I want to fetch all PMs from vBulletin as conversations like IM does, which should be used in my .NET Core library using Dapper.我想像 IM 一样从 vBulletin 获取所有 PM，应该在我的 .NET Core 库中使用 Dapper。 This means: A send message to B, B replys would be one conversation with two messages.这意味着：A 向 B 发送消息，B 回复将是一个包含两条消息的对话。 Since this causes performance issues, I tried to figure it out using DBeaver by running the Dapper queries directly.由于这会导致性能问题，我试图通过直接运行 Dapper 查询来使用 DBeaver 来解决这个问题。

To fetch the conversations of a page from the inbox, I wrote the following query:为了从收件箱中获取页面的对话，我编写了以下查询：

SELECT pm.pmid
FROM pm, pmtext AS txt
WHERE pm.pmtextid = txt.pmtextid 
AND (pm.userid = 123 OR txt.fromuserid = 123)
AND pm.folderid != -1
GROUP BY IF(pm.parentpmid != 0, pm.parentpmid, pm.pmid)
LIMIT 0, 50

This gave me the first 50 conversation ids for the user #123.这给了我用户 #123 的前 50 个对话 ID。 It works, but took ~440ms to execute.它有效，但执行时间约为 440 毫秒。 I tried adding indices to all relevant fields我尝试将索引添加到所有相关字段

ALTER TABLE pmtext ADD INDEX fromuserid_only(fromuserid);
ALTER TABLE pm ADD INDEX userid_only(userid);
ALTER TABLE pm ADD INDEX parentpmid(parentpmid);

but its still slow.但它仍然很慢。 It seems caused by the GROUP BY .它似乎是由GROUP BY引起的。 Even when I just do GROUP BY pm.parentpmid (which would produce wrong data, but just for performance testing), the query run time is not better.即使我只是做GROUP BY pm.parentpmid （这会产生错误的数据，但只是为了性能测试），查询运行时间也不会更好。 When I remove the GROUP BY , it's pretty fast (~12ms).当我删除GROUP BY时，它非常快（~12ms）。

My query that counts the total pages of conversations is similar without the join and its fast (< 20ms):我的计算对话总页数的查询在没有连接和它的快速（< 20ms）的情况下是相似的：

// DbConnection db = ...
string sqlTotalPages = @"
    SELECT CEIL(COUNT(*)/ 50) AS pages
   FROM pm, pmtext AS txt
    WHERE pm.pmtextid = txt.pmtextid 
    AND (pm.userid = 18 OR txt.fromuserid = 18)";
int totalPages = db.QueryFirstOrDefault<int>(sqlTotalPages);

Why does GROUP BY slow down the query so massively?为什么GROUP BY会大大减慢查询速度？ How could I improve the performance?我怎样才能提高性能？

Table structure from vB vB的表结构

CREATE TABLE `pm` (
  `pmid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `pmtextid` int(10) unsigned NOT NULL DEFAULT '0',
  `userid` int(10) unsigned NOT NULL DEFAULT '0',
  `folderid` smallint(6) NOT NULL DEFAULT '0',
  `messageread` smallint(5) unsigned NOT NULL DEFAULT '0',
  `parentpmid` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`pmid`),
  KEY `pmtextid` (`pmtextid`),
  KEY `userid` (`userid`,`folderid`),
  KEY `userid_only` (`userid`),
  KEY `parentpmid` (`parentpmid`)
) ENGINE=MyISAM AUTO_INCREMENT=221965 DEFAULT CHARSET=latin1

CREATE TABLE `pmtext` (
  `pmtextid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `fromuserid` int(10) unsigned NOT NULL DEFAULT '0',
  `fromusername` varchar(100) NOT NULL DEFAULT '',
  `title` varchar(250) NOT NULL DEFAULT '',
  `message` mediumtext,
  `touserarray` mediumtext,
  `iconid` smallint(5) unsigned NOT NULL DEFAULT '0',
  `dateline` int(10) unsigned NOT NULL DEFAULT '0',
  `showsignature` smallint(5) unsigned NOT NULL DEFAULT '0',
  `allowsmilie` smallint(5) unsigned NOT NULL DEFAULT '1',
  `reportthreadid` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`pmtextid`),
  KEY `fromuserid` (`fromuserid`,`dateline`),
  KEY `fromuserid_only` (`fromuserid`),
  KEY `fromuserid_only2` (`fromuserid`)
) ENGINE=MyISAM AUTO_INCREMENT=118470 DEFAULT CHARSET=latin1

2 个解决方案

I think the reason why GROUP BY is causing such as increase in processing time is because of the LIMIT.我认为 GROUP BY 导致处理时间增加的原因是因为 LIMIT。 When there's no GROUP BY the DB engine can stop processing rows in the query once it has found 50 that matches your criteria.当没有 GROUP BY 时，一旦数据库引擎找到 50 个符合您的条件的行，它就可以停止处理查询中的行。 With the GROUP BY clause though the entire table needs to be processed, grouped together and then the 50 first results will be returned.使用 GROUP BY 子句虽然需要处理整个表，将其分组在一起，然后将返回 50 个第一个结果。 As for solution, would you get the correct result if you removed the GROUP BY and added "AND pm.parentpmid = 0" to the WHERE-clause?至于解决方案，如果您删除 GROUP BY 并将“AND pm.parentpmid = 0”添加到 WHERE 子句，您会得到正确的结果吗？ It seems the GROUP BY clause is there to remove rows with a parent from the result which is more efficiently done using WHERE (assuming all rows with a parent also has the parent present among the results).似乎 GROUP BY 子句可以从结果中删除具有父级的行，这可以更有效地使用 WHERE 完成（假设所有具有父级的行在结果中也存在父级）。

In order to optimize your query, I need to know what you like to achieve with the group by clause.为了优化您的查询，我需要知道您希望使用group by子句实现什么。 Could you give a small example filling of the tables with your expected outcome?您能否举一个用您的预期结果填写表格的小例子？

If you only want to show the parent mails then I agree with Erik H that it is better to use the following query:如果您只想显示父邮件，那么我同意 Erik H 的观点，即最好使用以下查询：

SELECT pm.pmid
FROM pm, pmtext AS txt
WHERE pm.pmtextid = txt.pmtextid 
AND (pm.userid = 123 OR txt.fromuserid = 123)
AND pm.folderid != -1
AND pm.parentpmid = 0
LIMIT 0, 50;

but that gives a different result than your query does.但这给出的结果与您的查询不同。

The effect of your GROUP BY looks quite arbitrary to me.你的GROUP BY的效果对我来说看起来很随意。 Since the pmid is not part of an aggregate function and it is not grouped on, MySQL/mariaDB will return the first value that applies for the same grouping.由于pmid不是聚合 function 的一部分并且未分组，因此 MySQL/mariaDB 将返回适用于同一分组的第一个值。

When I add the following values to your database:当我将以下值添加到您的数据库时：

INSERT INTO pmtext (`fromuserid`, `fromusername`,`title`,`message`,`touserarray`,`iconid`,`dateline`,`showsignature`,`allowsmilie`,`reportthreadid`)
VALUES 
    (123, 'Pete',  'Titlel',            'Hello1', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Title2',            'Hello2', '', 0, 0, 0, 1, 0),
    (2,   'Hank',  'Re: Title1',        'Hello3', '', 0, 0, 0, 1, 0),
    (2,   'Hank',  'Re: Title2',        'Hello4', '', 0, 0, 0, 1, 0),
    (3,   'Chris', 'Re: Title2(a)',     'Hello5', '', 0, 0, 0, 1, 0),
    (2,   'Hank',  'Re: Re: Title2(a)', 'Hello6', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Title3',            'Hello7', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Re: Re: Title1',    'Hello8', '', 0, 0, 0, 1, 0),
    (123, 'Pete',  'Title4',            'Hello9', '', 0, 0, 0, 1, 0);

INSERT INTO pm ( `pmtextid`, `userid`, `folderid`, `messageread`, `parentpmid`)
VALUES
  (118470 , 123, 0, 0, 0),
  (118471 , 123, 0, 0, 0), 
  (118472 , 123, 0, 0, 221965),
  (118473 , 123, 0, 0, 221966), 
  (118474 , 123, 0, 0, 221966),
  (118475 , 123, 0, 0, 221969), 
  (118476 , 123, 0, 0, 0),
  (118477 , 123, 0, 0, 221967),
  (118478 , 123, 0, 0, 0);

Then your query will return:然后您的查询将返回：