[英]How would I make this query run faster?
How would I make this query run faster...?我怎样才能使这个查询运行得更快......?
SELECT account_id, account_name, account_update, account_sold, account_mds, ftp_url, ftp_livestatus, number_digits, number_cw, client_name, ppc_status, user_name FROM Accounts, FTPDetails, SiteNumbers, Clients, PPC, Users WHERE Accounts.account_id = FTPDetails.ftp_accountid AND Accounts.account_id = SiteNumbers.number_accountid AND Accounts.account_client = Clients.client_id AND Accounts.account_id = PPC.ppc_accountid AND Accounts.account_designer = Users.user_id AND Accounts.account_active = 'active' AND FTPDetails.ftp_active = 'active' AND SiteNumbers.number_active = 'active' AND Clients.client_active = 'active' AND PPC.ppc_active = 'active' AND Users.user_active = 'active' ORDER BY Accounts.account_update DESC
Thanks in advance:)提前致谢:)
EXPLAIN query results:解释查询结果:
I don't really have any foreign keys set up...I was trying to avoid making alterations to the database as will have to do a complete overhaul soon.我真的没有设置任何外键...我试图避免对数据库进行更改,因为很快就必须进行全面检修。
only primary keys are the id of each table eg account_id, ftp_id, ppc_id...只有主键是每个表的 id,例如 account_id、ftp_id、ppc_id...
Indexes索引
You need - at least - an index on every field that is used in a JOIN
condition.您需要 -至少- 在
JOIN
条件中使用的每个字段的索引。
Indexes on the fields that appear in WHERE
or GROUP BY
or ORDER BY
clauses are most of the time useful, too.出现在
WHERE
或GROUP BY
或ORDER BY
子句中的字段索引在大多数情况下也很有用。
When in a table, two or more fields are used in JOIns (or WHERE or GROUP BY or ORDER BY), a compound (combined) index of these (two or more) fields may be better than separate indexes.当在一个表中,两个或多个字段用于 JOIn(或 WHERE 或 GROUP BY 或 ORDER BY)时,这些(两个或多个)字段的复合(组合)索引可能比单独的索引更好。 For example in the
SiteNumbers
table, possible indexes are the compound (number_accountid, number_active)
or (number_active, number_accountid)
.例如,在
SiteNumbers
表中,可能的索引是复合(number_accountid, number_active)
或(number_active, number_accountid)
。
Condition in fields that are Boolean (ON/OFF, active/inactive) are sometimes slowing queries (as indexes are not selective and thus not very helpful). Boolean(开/关,活动/非活动)字段中的条件有时会减慢查询速度(因为索引不是选择性的,因此不是很有帮助)。 Restructuring (father normalizing) the tables is an option in that case but probably you can avoid the added complexity.
在这种情况下,重组(父亲规范化)表是一种选择,但您可能可以避免增加的复杂性。
Besides the usual advice (examine the EXPLAIN plan, add indexes where needed, test variations of the query),除了通常的建议(检查 EXPLAIN 计划,在需要的地方添加索引,测试查询的变体),
I notice that in your query there is a partial Cartesian Product.我注意到在您的查询中有一个部分笛卡尔积。 The table
Accounts
has a one-to-many relationships to three tables FTPDetails
, SiteNumbers
and PPC
.表
Accounts
与三个表FTPDetails
、 SiteNumbers
和PPC
具有一对多的关系。 This has the effect that if you have for example 1000 accounts, and every account is related to, say, 10 FTPDetails, 20 SiteNumbers and 3 PPCs, the query will return for every account 600 rows (the product of 10x20x3).例如,如果您有 1000 个帐户,并且每个帐户都与 10 个 FTPDetails、20 个 SiteNumbers 和 3 个 PPC 相关,则查询将为每个帐户返回 600 行(10x20x3 的乘积)。 In total 600K rows where many data are duplicated.
总共有 600K 行,其中重复了许多数据。
You could instead split the query into three plus one for base data (Account and the rest tables).您可以将查询拆分为三加一以获取基本数据(帐户和 rest 表)。 That way, only 34K rows of data (having smaller length) would be transfered:
这样,只会传输 34K 行数据(长度更短):
Accounts JOIN Clients JOIN Users
(with all fields needed from these tables)
1K rows
Accounts JOIN FTPDetails
(with Accounts.account_id and all fields from FTPDetails)
10K rows
Accounts JOIN SiteNumbers
(with Accounts.account_id and all fields from SiteNumbers)
20K rows
Accounts JOIN PPC
(with Accounts.account_id and all fields from PPC)
3K rows
and then use the data from the 4 queries in the client side to show combined info.然后在客户端使用来自 4 个查询的数据来显示组合信息。
I would add the following indexes:我会添加以下索引:
Table Accounts
index on (account_designer)
index on (account_client)
index on (account_active, account_id)
index on (account_update)
Table FTPDetails
index on (ftp_active, ftp_accountid)
Table SiteNumbers
index on (number_active, number_accountid)
Table PPC
index on (ppc_active, ppc_accountid)
Use EXPLAIN to find out which index could be used and which index is actually used.使用EXPLAIN找出可以使用的索引以及实际使用的索引。 Create an appropriate index if necessary.
如有必要,创建适当的索引。
If FTPDetails.ftp_active
only has the two valid entries 'active'
and 'inactive'
, use BOOL
as data type.如果
FTPDetails.ftp_active
只有两个有效条目'active'
和'inactive'
,则使用BOOL
作为数据类型。
As a side note: I strongly suggest using explicit joins instead of implicit ones:作为旁注:我强烈建议使用显式连接而不是隐式连接:
SELECT
account_id, account_name, account_update, account_sold, account_mds,
ftp_url, ftp_livestatus,
number_digits, number_cw,
client_name,
ppc_status,
user_name
FROM Accounts
INNER JOIN FTPDetails
ON Accounts.account_id = FTPDetails.ftp_accountid
AND FTPDetails.ftp_active = 'active'
INNER JOIN SiteNumbers
ON Accounts.account_id = SiteNumbers.number_accountid
AND SiteNumbers.number_active = 'active'
INNER JOIN Clients
ON Accounts.account_client = Clients.client_id
AND Clients.client_active = 'active'
INNER JOIN PPC
ON Accounts.account_id = PPC.ppc_accountid
AND PPC.ppc_active = 'active'
INNER JOIN Users
ON Accounts.account_designer = Users.user_id
AND Users.user_active = 'active'
WHERE Accounts.account_active = 'active'
ORDER BY Accounts.account_update DESC
This makes the query much more readable because the join condition is close to the name of the table that is being joined.这使得查询更具可读性,因为连接条件接近正在连接的表的名称。
EXPLAIN, benchmark different options.解释,对不同的选项进行基准测试。 For starters, I'm sure that several queries will be faster than this monster.
对于初学者,我确信有几个查询会比这个怪物更快。 First, because query optimiser will spend a lot of time examining what join order is the best (5.=120 possibilities), And second, queries like
SELECT... WHERE....active = 'active'
will be cached (though it depends on an amount of data changes).首先,因为查询优化器将花费大量时间检查最佳连接顺序(5.=120 种可能性),其次,像
SELECT... WHERE....active = 'active'
这样的查询将被缓存(尽管这取决于数据更改的数量)。
One of your main problems is here: x.y_active = 'active'
您的主要问题之一在这里:
x.y_active = 'active'
Problem: low cardinality问题:基数低
The active field is a boolean field with 2 possible values, as such it has very low cardinality.活动字段是具有 2 个可能值的 boolean 字段,因此它的基数非常低。 MySQL (or any SQL for that matter will not use an index when 30% or more of the rows have the same value).
MySQL(或任何 SQL 在 30% 或更多的行具有相同值时不会使用索引)。
Forcing the index is useless because it will make your query slower, not faster.强制索引是无用的,因为它会使您的查询变慢,而不是更快。
Solution: partition your tables解决方案:对表进行分区
A solution is to partition your tables on the active
columns.一种解决方案是在
active
列上对表进行分区。
This will exclude all non-active fields from consideration, and will make the select
act as if you actually have a working index on the xxx-active
fields.这将排除所有非活动字段,并使
select
的行为就像您实际上在xxx-active
字段上有一个工作索引一样。
Sidenote边注
Please don't ever use implicit where
joins, it's much too error prone and consufing to be useful.请永远不要使用隐式
where
连接,它太容易出错并且无法使用。
Use a syntax like Oswald's answer instead.请改用Oswald's answer 之类的语法。
Links:链接:
Cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)基数: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
Cardinality and indexes: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm基数和索引: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm
MySQL partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html MySQL 分区: http://dev.mysql.com/doc/refman/5.5/en/partitioning.ZDFC35FDC70D2E298
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.