简体   繁体   English

如何使此查询运行得更快?

[英]How would I make this query run faster?

How would I make this query run faster...?我怎样才能使这个查询运行得更快......?

SELECT    account_id, 
          account_name, 
          account_update, 
          account_sold, 
          account_mds, 
          ftp_url,         
          ftp_livestatus, 
          number_digits, 
          number_cw,
          client_name, 
          ppc_status, 
          user_name 
FROM     
         Accounts, 
         FTPDetails, 
         SiteNumbers, 
         Clients, 
         PPC, 
         Users 

WHERE    Accounts.account_id = FTPDetails.ftp_accountid 
AND      Accounts.account_id = SiteNumbers.number_accountid 
AND      Accounts.account_client = Clients.client_id     
AND      Accounts.account_id = PPC.ppc_accountid 
AND      Accounts.account_designer = Users.user_id   
AND      Accounts.account_active = 'active' 
AND      FTPDetails.ftp_active = 'active' 
AND      SiteNumbers.number_active = 'active' 
AND      Clients.client_active = 'active'    
AND      PPC.ppc_active = 'active'   
AND      Users.user_active = 'active' 
ORDER BY 
         Accounts.account_update DESC

Thanks in advance:)提前致谢:)

EXPLAIN query results:解释查询结果:

表的第一部分

表的第二部分

I don't really have any foreign keys set up...I was trying to avoid making alterations to the database as will have to do a complete overhaul soon.我真的没有设置任何外键...我试图避免对数据库进行更改,因为很快就必须进行全面检修。

only primary keys are the id of each table eg account_id, ftp_id, ppc_id...只有主键是每个表的 id,例如 account_id、ftp_id、ppc_id...

Indexes索引

  • You need - at least - an index on every field that is used in a JOIN condition.您需要 -至少- 在JOIN条件中使用的每个字段的索引。

  • Indexes on the fields that appear in WHERE or GROUP BY or ORDER BY clauses are most of the time useful, too.出现在WHEREGROUP BYORDER BY子句中的字段索引在大多数情况下也很有用。

  • When in a table, two or more fields are used in JOIns (or WHERE or GROUP BY or ORDER BY), a compound (combined) index of these (two or more) fields may be better than separate indexes.当在一个表中,两个或多个字段用于 JOIn(或 WHERE 或 GROUP BY 或 ORDER BY)时,这些(两个或多个)字段的复合(组合)索引可能比单独的索引更好。 For example in the SiteNumbers table, possible indexes are the compound (number_accountid, number_active) or (number_active, number_accountid) .例如,在SiteNumbers表中,可能的索引是复合(number_accountid, number_active)(number_active, number_accountid)

  • Condition in fields that are Boolean (ON/OFF, active/inactive) are sometimes slowing queries (as indexes are not selective and thus not very helpful). Boolean(开/关,活动/非活动)字段中的条件有时会减慢查询速度(因为索引不是选择性的,因此不是很有帮助)。 Restructuring (father normalizing) the tables is an option in that case but probably you can avoid the added complexity.在这种情况下,重组(父亲规范化)表是一种选择,但您可能可以避免增加的复杂性。


Besides the usual advice (examine the EXPLAIN plan, add indexes where needed, test variations of the query),除了通常的建议(检查 EXPLAIN 计划,在需要的地方添加索引,测试查询的变体),

I notice that in your query there is a partial Cartesian Product.我注意到在您的查询中有一个部分笛卡尔积。 The table Accounts has a one-to-many relationships to three tables FTPDetails , SiteNumbers and PPC .Accounts与三个表FTPDetailsSiteNumbersPPC具有一对多的关系。 This has the effect that if you have for example 1000 accounts, and every account is related to, say, 10 FTPDetails, 20 SiteNumbers and 3 PPCs, the query will return for every account 600 rows (the product of 10x20x3).例如,如果您有 1000 个帐户,并且每个帐户都与 10 个 FTPDetails、20 个 SiteNumbers 和 3 个 PPC 相关,则查询将为每个帐户返回 600 行(10x20x3 的乘积)。 In total 600K rows where many data are duplicated.总共有 600K 行,其中重复了许多数据。

You could instead split the query into three plus one for base data (Account and the rest tables).您可以将查询拆分为三加一以获取基本数据(帐户和 rest 表)。 That way, only 34K rows of data (having smaller length) would be transfered:这样,只会传输 34K 行数据(长度更短):

Accounts JOIN Clients JOIN Users 
  (with all fields needed from these tables)
  1K rows

Accounts JOIN FTPDetails
  (with Accounts.account_id and all fields from FTPDetails)
  10K rows

Accounts JOIN SiteNumbers
  (with Accounts.account_id and all fields from SiteNumbers)
  20K rows

Accounts JOIN PPC
  (with Accounts.account_id and all fields from PPC)
  3K rows

and then use the data from the 4 queries in the client side to show combined info.然后在客户端使用来自 4 个查询的数据来显示组合信息。



I would add the following indexes:我会添加以下索引:

Table Accounts
  index on (account_designer)
  index on (account_client)
  index on (account_active, account_id)
  index on (account_update)

Table FTPDetails
  index on (ftp_active, ftp_accountid)

Table SiteNumbers
  index on (number_active, number_accountid)

Table PPC
  index on (ppc_active, ppc_accountid)

Use EXPLAIN to find out which index could be used and which index is actually used.使用EXPLAIN找出可以使用的索引以及实际使用的索引。 Create an appropriate index if necessary.如有必要,创建适当的索引。

If FTPDetails.ftp_active only has the two valid entries 'active' and 'inactive' , use BOOL as data type.如果FTPDetails.ftp_active只有两个有效条目'active''inactive' ,则使用BOOL作为数据类型。

As a side note: I strongly suggest using explicit joins instead of implicit ones:作为旁注:我强烈建议使用显式连接而不是隐式连接:

SELECT
  account_id, account_name, account_update, account_sold, account_mds, 
  ftp_url, ftp_livestatus, 
  number_digits, number_cw,
  client_name, 
  ppc_status, 
  user_name 
FROM Accounts 
INNER JOIN FTPDetails
  ON  Accounts.account_id = FTPDetails.ftp_accountid
  AND FTPDetails.ftp_active = 'active'
INNER JOIN SiteNumbers
  ON  Accounts.account_id = SiteNumbers.number_accountid 
  AND SiteNumbers.number_active = 'active'
INNER JOIN Clients
  ON  Accounts.account_client = Clients.client_id
  AND Clients.client_active = 'active'
INNER JOIN PPC
  ON  Accounts.account_id = PPC.ppc_accountid
  AND PPC.ppc_active = 'active'
INNER JOIN Users
  ON  Accounts.account_designer = Users.user_id
  AND Users.user_active = 'active'
WHERE Accounts.account_active = 'active' 
ORDER BY Accounts.account_update DESC

This makes the query much more readable because the join condition is close to the name of the table that is being joined.这使得查询更具可读性,因为连接条件接近正在连接的表的名称。

EXPLAIN, benchmark different options.解释,对不同的选项进行基准测试。 For starters, I'm sure that several queries will be faster than this monster.对于初学者,我确信有几个查询会比这个怪物更快。 First, because query optimiser will spend a lot of time examining what join order is the best (5.=120 possibilities), And second, queries like SELECT... WHERE....active = 'active' will be cached (though it depends on an amount of data changes).首先,因为查询优化器将花费大量时间检查最佳连接顺序(5.=120 种可能性),其次,像SELECT... WHERE....active = 'active'这样的查询将被缓存(尽管这取决于数据更改的数量)。

One of your main problems is here: x.y_active = 'active'您的主要问题之一在这里: x.y_active = 'active'

Problem: low cardinality问题:基数低
The active field is a boolean field with 2 possible values, as such it has very low cardinality.活动字段是具有 2 个可能值的 boolean 字段,因此它的基数非常低。 MySQL (or any SQL for that matter will not use an index when 30% or more of the rows have the same value). MySQL(或任何 SQL 在 30% 或更多的行具有相同值时不会使用索引)。
Forcing the index is useless because it will make your query slower, not faster.强制索引是无用的,因为它会使您的查询变慢,而不是更快。

Solution: partition your tables解决方案:对表进行分区
A solution is to partition your tables on the active columns.一种解决方案是在active列上对表进行分区。
This will exclude all non-active fields from consideration, and will make the select act as if you actually have a working index on the xxx-active fields.这将排除所有非活动字段,并使select的行为就像您实际上在xxx-active字段上有一个工作索引一样。

Sidenote边注
Please don't ever use implicit where joins, it's much too error prone and consufing to be useful.请永远不要使用隐式where连接,它太容易出错并且无法使用。
Use a syntax like Oswald's answer instead.请改用Oswald's answer 之类的语法。

Links:链接:
Cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)基数: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
Cardinality and indexes: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm基数和索引: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm
MySQL partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html MySQL 分区: http://dev.mysql.com/doc/refman/5.5/en/partitioning.ZDFC35FDC70D2E298

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM