简体   繁体   中英

Create a MYSQL index for where clause and order_by

Considering this table,

CREATE TABLE tbl_tax (
  taxdata_id int(11) NOT NULL AUTO_INCREMENT,
  tax_year varchar(255) NOT NULL,
  display_pid varchar(255) NOT NULL,
  type varchar(255) NOT NULL,
  tax_id varchar(255) NOT NULL,
  tax_amount varchar(255) NOT NULL,
  total_due varchar(255) NOT NULL,
  paid_wcert varchar(255) NOT NULL,
  datelast_adv varchar(255) NOT NULL,
  pmtmade_today varchar(255) NOT NULL,
  owner_name varchar(255) NOT NULL,
  PRIMARY KEY (taxdata_id),
  UNIQUE KEY unique_tbl_tax_TaxidYear (tax_id,tax_year),
  KEY tax_year_2 (tax_year, owner_name, tax_id, display_pid, 
    type, tax_amount, total_due, total_paid, datelast_adv, pmtmade_today, 
    taxdata_id, paid_wcert)
) ENGINE=InnoDB AUTO_INCREMENT=100000 DEFAULT CHARSET=latin1;
 tbl_tax;

Considering this SQL query,

SELECT tax_year
     , tax_id
     , owner_name
     , display_pid
     , type
     , tax_amount
     , total_due
     , total_paid
     , datelast_adv
     , pmtmade_today
     , taxdata_id
     , paid_wcert
  FROM tbl_tax
 WHERE tax_year >= '2015'
   AND tax_year <= '2019'
 ORDER 
    BY tax_year DESC;

I want to create an index and have tried creating a cover index.

Quoting from this article, “The general rule is to choose the columns for filtering first (WHERE clause with equality conditions), then sorting/grouping (ORDER BY and GROUP BY clauses) and finally the data projection (SELECT clause).”

ALTER TABLE tbl_tax
ADD INDEX (
    `tax_year`, `owner_name`, `tax_id`, `display_pid`, 
    `type`, `tax_amount`, `total_due`, `total_paid`, `datelast_adv`, `pmtmade_today`, 
    `taxdata_id`, `paid_wcert`
);

Doing an explain , shows,

        "id" : 1,
        "select_type" : "SIMPLE",
        "table" : "tbl_tax",
        "partitions" : null,
        "type" : "index",
        "possible_keys" : "tax_year_2",
        "key" : "tax_year_2",
        "key_len" : "2831",
        "ref" : null,
        "rows" : 271630,
        "filtered" : 50.00,
        "Extra" : "Using where; Backward index scan; Using index"   

While creating indexes, I am aware that:-

  1. WHERE clause including range predicates (<=, >=)
  2. Query includes ORDER_BY in a different order than the order in which rows are accessed.

These could be the reasons that the output of explain shows "rows": 271630,

However, the SQL query's resultset is only ~2000 rows.

Tried reading many articles, however I am still struggling to optimize this.

What can I do about this situation to get a better optimization? Can I create indexes in a better way? Can I make any changes to the SQL query? Also, feel free to correct me if I have misunderstood something here.

This is an interesting case because normally we like to see Using index in the EXPLAIN plan, but in this case it's a detriment.

The reason is that this is type: index which means it's doing an index scan. Which means it's scanning the whole index, not just the rows that match your condition. That's why it shows rows: 271630 . This is basically the size of your table (or at least what the optimizer estimates to be the size of your table based on its statistics).

In this case, I don't think it is helping that you added every column to your index. You would have been better off with an index of one column: tax_year .

Then I would expect the EXPLAIN to show type: range because of your conditions, which indicates the only rows it is examining are those that match the condition.

Then we'd see Filtered: 100.00 which indicates all the rows examined are included in the result, which is good. It means the query was efficient in that no row was examined but then filtered out.

Also since your ORDER BY is for the same column, I would still expect Using filesort to be absent, and that's good.


Re your comment:

I suppose your condition of the tax_year between 2015 and 2019 is matching a significantly large subset of the table. MySQL chooses not to use the index if your condition matches a big portion of the rows. It estimates it would be more costly to use the index than to just scan the table.

If you think the optimizer is making a wrong choice, you can give it a hint that a table-scan should be assumed to be more costly:

... FROM tbl_tax FORCE INDEX(tax_year) ...

(I'm assuming the name of the index is tax_year , but you should replace that with the name of the index in your case.)

I also agree with the others that your use of varchar(255) for every attribute column is inappropriate.

INDEX(tax_year, ...) does handle that WHERE and ORDER BY .

Query includes ORDER_BY in a different order than the order in which rows are accessed.

False. The WHERE does not specify an order for accessing them. In fact the EXPLAIN says "Backward index scan". All is well.

Use reasonable datatypes, such as a 2-byte YEAR for tax_year instead of VARCHAR(255) , which takes 6 bytes for a year.

Arithmetic on varchars (for "amounts", etc) will be messy.

Sure, the "covering" index helps a little. But I don't like to make indexes bigger than, say 5 columns. You big index helps that query some, but hurts INSERTs some, too.

(And I agree with Bill.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM