[英]Create a MYSQL index for where clause and order_by
Considering this table,考虑到这张表,
CREATE TABLE tbl_tax (
taxdata_id int(11) NOT NULL AUTO_INCREMENT,
tax_year varchar(255) NOT NULL,
display_pid varchar(255) NOT NULL,
type varchar(255) NOT NULL,
tax_id varchar(255) NOT NULL,
tax_amount varchar(255) NOT NULL,
total_due varchar(255) NOT NULL,
paid_wcert varchar(255) NOT NULL,
datelast_adv varchar(255) NOT NULL,
pmtmade_today varchar(255) NOT NULL,
owner_name varchar(255) NOT NULL,
PRIMARY KEY (taxdata_id),
UNIQUE KEY unique_tbl_tax_TaxidYear (tax_id,tax_year),
KEY tax_year_2 (tax_year, owner_name, tax_id, display_pid,
type, tax_amount, total_due, total_paid, datelast_adv, pmtmade_today,
taxdata_id, paid_wcert)
) ENGINE=InnoDB AUTO_INCREMENT=100000 DEFAULT CHARSET=latin1;
tbl_tax;
Considering this SQL query,考虑到这个 SQL 查询,
SELECT tax_year
, tax_id
, owner_name
, display_pid
, type
, tax_amount
, total_due
, total_paid
, datelast_adv
, pmtmade_today
, taxdata_id
, paid_wcert
FROM tbl_tax
WHERE tax_year >= '2015'
AND tax_year <= '2019'
ORDER
BY tax_year DESC;
I want to create an index and have tried creating a cover index.我想创建一个索引并尝试创建一个封面索引。
Quoting from this article, “The general rule is to choose the columns for filtering first (WHERE clause with equality conditions), then sorting/grouping (ORDER BY and GROUP BY clauses) and finally the data projection (SELECT clause).”引用这篇文章,“一般规则是首先选择要过滤的列(具有相等条件的 WHERE 子句),然后是排序/分组(ORDER BY 和 GROUP BY 子句),最后是数据投影(SELECT 子句)。”
ALTER TABLE tbl_tax
ADD INDEX (
`tax_year`, `owner_name`, `tax_id`, `display_pid`,
`type`, `tax_amount`, `total_due`, `total_paid`, `datelast_adv`, `pmtmade_today`,
`taxdata_id`, `paid_wcert`
);
Doing an explain
, shows,做一个explain
,显示,
"id" : 1,
"select_type" : "SIMPLE",
"table" : "tbl_tax",
"partitions" : null,
"type" : "index",
"possible_keys" : "tax_year_2",
"key" : "tax_year_2",
"key_len" : "2831",
"ref" : null,
"rows" : 271630,
"filtered" : 50.00,
"Extra" : "Using where; Backward index scan; Using index"
While creating indexes, I am aware that:-在创建索引时,我知道:-
These could be the reasons that the output of explain
shows "rows": 271630,
这些可能是explain
的 output 显示"rows": 271630,
However, the SQL query's resultset is only ~2000 rows.但是,SQL 查询的结果集只有 ~2000 行。
Tried reading many articles, however I am still struggling to optimize this.尝试阅读许多文章,但我仍在努力优化它。
What can I do about this situation to get a better optimization?我该怎么做才能获得更好的优化? Can I create indexes in a better way?我可以用更好的方式创建索引吗? Can I make any changes to the SQL query?我可以对 SQL 查询进行任何更改吗? Also, feel free to correct me if I have misunderstood something here.另外,如果我在这里误解了什么,请随时纠正我。
This is an interesting case because normally we like to see Using index
in the EXPLAIN plan, but in this case it's a detriment.这是一个有趣的案例,因为通常我们希望在 EXPLAIN 计划中看到Using index
,但在这种情况下这是一种损害。
The reason is that this is type: index
which means it's doing an index scan.原因是这是type: index
,这意味着它正在进行索引扫描。 Which means it's scanning the whole index, not just the rows that match your condition.这意味着它正在扫描整个索引,而不仅仅是符合您条件的行。 That's why it shows rows: 271630
.这就是它显示rows: 271630
的原因。 This is basically the size of your table (or at least what the optimizer estimates to be the size of your table based on its statistics).这基本上就是您的表的大小(或者至少是优化器根据其统计信息估计的表的大小)。
In this case, I don't think it is helping that you added every column to your index.在这种情况下,我认为将每一列添加到索引中没有帮助。 You would have been better off with an index of one column: tax_year
.使用一列的索引会更好: tax_year
。
Then I would expect the EXPLAIN to show type: range
because of your conditions, which indicates the only rows it is examining are those that match the condition.然后我希望 EXPLAIN 显示type: range
因为你的条件,这表明它正在检查的唯一行是那些匹配条件的行。
Then we'd see Filtered: 100.00
which indicates all the rows examined are included in the result, which is good.然后我们会看到Filtered: 100.00
这表明所有检查的行都包含在结果中,这很好。 It means the query was efficient in that no row was examined but then filtered out.这意味着查询是有效的,因为没有行被检查但随后被过滤掉。
Also since your ORDER BY is for the same column, I would still expect Using filesort
to be absent, and that's good.此外,由于您的 ORDER BY 是针对同一列的,因此我仍然希望Using filesort
不存在,这很好。
Re your comment:回复您的评论:
I suppose your condition of the tax_year between 2015 and 2019 is matching a significantly large subset of the table.我想您在 2015 年至 2019 年之间的 tax_year 条件与表的很大一部分相匹配。 MySQL chooses not to use the index if your condition matches a big portion of the rows.如果您的条件匹配大部分行,MySQL 选择不使用索引。 It estimates it would be more costly to use the index than to just scan the table.它估计使用索引比只扫描表的成本更高。
If you think the optimizer is making a wrong choice, you can give it a hint that a table-scan should be assumed to be more costly:如果你认为优化器做出了错误的选择,你可以给它一个提示,应该假定表扫描的成本更高:
... FROM tbl_tax FORCE INDEX(tax_year) ...
(I'm assuming the name of the index is tax_year
, but you should replace that with the name of the index in your case.) (我假设索引的名称是tax_year
,但你应该用你的情况下的索引名称替换它。)
I also agree with the others that your use of varchar(255)
for every attribute column is inappropriate.我也同意其他人的观点,即您对每个属性列使用varchar(255)
是不合适的。
INDEX(tax_year, ...)
does handle that WHERE
and ORDER BY
. INDEX(tax_year, ...)
确实处理WHERE
和ORDER BY
。
Query includes ORDER_BY in a different order than the order in which rows are accessed.查询包含 ORDER_BY 的顺序与访问行的顺序不同。
False.错误的。 The WHERE
does not specify an order for accessing them. WHERE
没有指定访问它们的顺序。 In fact the EXPLAIN
says "Backward index scan".实际上EXPLAIN
说的是“向后索引扫描”。 All is well.一切都很好。
Use reasonable datatypes, such as a 2-byte YEAR
for tax_year
instead of VARCHAR(255)
, which takes 6 bytes for a year.使用合理的数据类型,例如tax_year
的 2 字节YEAR
而不是VARCHAR(255)
,它需要 6 个字节来表示一年。
Arithmetic on varchars (for "amounts", etc) will be messy. varchars 的算术(“数量”等)会很混乱。
Sure, the "covering" index helps a little.当然,“覆盖”索引有一点帮助。 But I don't like to make indexes bigger than, say 5 columns.但我不喜欢让索引大于 5 列。 You big index helps that query some, but hurts INSERTs
some, too.你的大索引有助于查询一些,但也会伤害一些INSERTs
。
(And I agree with Bill.) (我同意比尔的观点。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.