简体   繁体   English

为where子句和order_by创建MYSQL索引

[英]Create a MYSQL index for where clause and order_by

Considering this table,考虑到这张表,

CREATE TABLE tbl_tax (
  taxdata_id int(11) NOT NULL AUTO_INCREMENT,
  tax_year varchar(255) NOT NULL,
  display_pid varchar(255) NOT NULL,
  type varchar(255) NOT NULL,
  tax_id varchar(255) NOT NULL,
  tax_amount varchar(255) NOT NULL,
  total_due varchar(255) NOT NULL,
  paid_wcert varchar(255) NOT NULL,
  datelast_adv varchar(255) NOT NULL,
  pmtmade_today varchar(255) NOT NULL,
  owner_name varchar(255) NOT NULL,
  PRIMARY KEY (taxdata_id),
  UNIQUE KEY unique_tbl_tax_TaxidYear (tax_id,tax_year),
  KEY tax_year_2 (tax_year, owner_name, tax_id, display_pid, 
    type, tax_amount, total_due, total_paid, datelast_adv, pmtmade_today, 
    taxdata_id, paid_wcert)
) ENGINE=InnoDB AUTO_INCREMENT=100000 DEFAULT CHARSET=latin1;
 tbl_tax;

Considering this SQL query,考虑到这个 SQL 查询,

SELECT tax_year
     , tax_id
     , owner_name
     , display_pid
     , type
     , tax_amount
     , total_due
     , total_paid
     , datelast_adv
     , pmtmade_today
     , taxdata_id
     , paid_wcert
  FROM tbl_tax
 WHERE tax_year >= '2015'
   AND tax_year <= '2019'
 ORDER 
    BY tax_year DESC;

I want to create an index and have tried creating a cover index.我想创建一个索引并尝试创建一个封面索引。

Quoting from this article, “The general rule is to choose the columns for filtering first (WHERE clause with equality conditions), then sorting/grouping (ORDER BY and GROUP BY clauses) and finally the data projection (SELECT clause).”引用这篇文章,“一般规则是首先选择要过滤的列(具有相等条件的 WHERE 子句),然后是排序/分组(ORDER BY 和 GROUP BY 子句),最后是数据投影(SELECT 子句)。”

ALTER TABLE tbl_tax
ADD INDEX (
    `tax_year`, `owner_name`, `tax_id`, `display_pid`, 
    `type`, `tax_amount`, `total_due`, `total_paid`, `datelast_adv`, `pmtmade_today`, 
    `taxdata_id`, `paid_wcert`
);

Doing an explain , shows,做一个explain ,显示,

        "id" : 1,
        "select_type" : "SIMPLE",
        "table" : "tbl_tax",
        "partitions" : null,
        "type" : "index",
        "possible_keys" : "tax_year_2",
        "key" : "tax_year_2",
        "key_len" : "2831",
        "ref" : null,
        "rows" : 271630,
        "filtered" : 50.00,
        "Extra" : "Using where; Backward index scan; Using index"   

While creating indexes, I am aware that:-在创建索引时,我知道:-

  1. WHERE clause including range predicates (<=, >=) WHERE 子句包括范围谓词 (<=, >=)
  2. Query includes ORDER_BY in a different order than the order in which rows are accessed.查询包含 ORDER_BY 的顺序与访问行的顺序不同。

These could be the reasons that the output of explain shows "rows": 271630,这些可能是explain的 output 显示"rows": 271630,

However, the SQL query's resultset is only ~2000 rows.但是,SQL 查询的结果集只有 ~2000 行。

Tried reading many articles, however I am still struggling to optimize this.尝试阅读许多文章,但我仍在努力优化它。

What can I do about this situation to get a better optimization?我该怎么做才能获得更好的优化? Can I create indexes in a better way?我可以用更好的方式创建索引吗? Can I make any changes to the SQL query?我可以对 SQL 查询进行任何更改吗? Also, feel free to correct me if I have misunderstood something here.另外,如果我在这里误解了什么,请随时纠正我。

This is an interesting case because normally we like to see Using index in the EXPLAIN plan, but in this case it's a detriment.这是一个有趣的案例,因为通常我们希望在 EXPLAIN 计划中看到Using index ,但在这种情况下这是一种损害。

The reason is that this is type: index which means it's doing an index scan.原因是这是type: index ,这意味着它正在进行索引扫描。 Which means it's scanning the whole index, not just the rows that match your condition.这意味着它正在扫描整个索引,而不仅仅是符合您条件的行。 That's why it shows rows: 271630 .这就是它显示rows: 271630的原因。 This is basically the size of your table (or at least what the optimizer estimates to be the size of your table based on its statistics).这基本上就是您的表的大小(或者至少是优化器根据其统计信息估计的表的大小)。

In this case, I don't think it is helping that you added every column to your index.在这种情况下,我认为将每一列添加到索引中没有帮助。 You would have been better off with an index of one column: tax_year .使用列的索引会更好: tax_year

Then I would expect the EXPLAIN to show type: range because of your conditions, which indicates the only rows it is examining are those that match the condition.然后我希望 EXPLAIN 显示type: range因为你的条件,这表明它正在检查的唯一行是那些匹配条件的行。

Then we'd see Filtered: 100.00 which indicates all the rows examined are included in the result, which is good.然后我们会看到Filtered: 100.00这表明所有检查的行都包含在结果中,这很好。 It means the query was efficient in that no row was examined but then filtered out.这意味着查询是有效的,因为没有行被检查但随后被过滤掉。

Also since your ORDER BY is for the same column, I would still expect Using filesort to be absent, and that's good.此外,由于您的 ORDER BY 是针对同一列的,因此我仍然希望Using filesort不存在,这很好。


Re your comment:回复您的评论:

I suppose your condition of the tax_year between 2015 and 2019 is matching a significantly large subset of the table.我想您在 2015 年至 2019 年之间的 tax_year 条件与表的很大一部分相匹配。 MySQL chooses not to use the index if your condition matches a big portion of the rows.如果您的条件匹配大部分行,MySQL 选择不使用索引。 It estimates it would be more costly to use the index than to just scan the table.它估计使用索引比只扫描表的成本更高。

If you think the optimizer is making a wrong choice, you can give it a hint that a table-scan should be assumed to be more costly:如果你认为优化器做出了错误的选择,你可以给它一个提示,应该假定表扫描的成本更高:

... FROM tbl_tax FORCE INDEX(tax_year) ...

(I'm assuming the name of the index is tax_year , but you should replace that with the name of the index in your case.) (我假设索引的名称是tax_year ,但你应该用你的情况下的索引名称替换它。)

I also agree with the others that your use of varchar(255) for every attribute column is inappropriate.我也同意其他人的观点,即您对每个属性列使用varchar(255)是不合适的。

INDEX(tax_year, ...) does handle that WHERE and ORDER BY . INDEX(tax_year, ...)确实处理WHEREORDER BY

Query includes ORDER_BY in a different order than the order in which rows are accessed.查询包含 ORDER_BY 的顺序与访问行的顺序不同。

False.错误的。 The WHERE does not specify an order for accessing them. WHERE没有指定访问它们的顺序。 In fact the EXPLAIN says "Backward index scan".实际上EXPLAIN说的是“向后索引扫描”。 All is well.一切都很好。

Use reasonable datatypes, such as a 2-byte YEAR for tax_year instead of VARCHAR(255) , which takes 6 bytes for a year.使用合理的数据类型,例如tax_year的 2 字节YEAR而不是VARCHAR(255) ,它需要 6 个字节来表示一年。

Arithmetic on varchars (for "amounts", etc) will be messy. varchars 的算术(“数量”等)会很混乱。

Sure, the "covering" index helps a little.当然,“覆盖”索引有一点帮助。 But I don't like to make indexes bigger than, say 5 columns.但我不喜欢让索引大于 5 列。 You big index helps that query some, but hurts INSERTs some, too.你的大索引有助于查询一些,但也会伤害一些INSERTs

(And I agree with Bill.) (我同意比尔的观点。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM