简体   繁体   中英

Indexed column and not indexed column research

I generated separate MySQL Innodb tables with 2000, 5000, 10000, 50000, 10000, 20000, 50000, 100 000, 200 000 elements(with help of php loop and insert query). Each table has two columns: id(Primary Key INT autoincrement), number(INT UNIQUE KEY). Then I did the same but this time I generated similar tables where number column doesn't have an INDEX .I generated tables in a such way: the value of column number is equal to value of index + 2: first element == 3, 1000th element is 1002 and so on. I wanted to test a query like that, because It will be used in my application:

SELECT count(number) FROM number_two_hundred_I WHERE number=200002;

After generating data for these tables I wanted to test time for the worst case queries. I used SHOW PROFILES for it. I made an assumption that the worst case query would correspond to the element with value of column number to 1002, 2002, and so on, so here are all the queries that I tested and the time(evaluated by SHOW PROFILES):

SELECT count(number) FROM number_two_thousand_I WHERE number=2002;
// for tables with indexed column number I used **suffix _I** in the end 
// of name of the table. Here is the time for it 0.00099250
SELECT count(number) FROM number_two_thousand WHERE number=2002;
// column number is not indexed when there is no **suffix _I** 
// time for this one is 0.00226275
SELECT count(number) FROM number_five_thousand_I WHERE number=5002;
// 0.00095600
SELECT count(number) FROM number_five_thousand WHERE number=5002;
// 0.00404125

So here are the results:

  1. 2000 el - indexed 0.00099250 not indexed - 0.00226275

  2. 5000 el - indexed 0.00095600 not indexed - 0.00404125

  3. 10000 el - indexed 0.00156900 not indexed - 0.00761750

  4. 20000 el - indexed 0.00155850 not indexed - 0.01452820
  5. 50000 el - indexed 0.00051100 not indexed - 0.04127450
  6. 100000 el indexed 0.00121750 not indexed - 0.07120075
  7. 200000 el indexed 0.00095025 not indexed - 0.11406950

Here is infographic for that. It shows how number of elements depends on the worst case time of query for indexed/not indexed column. Indexed is red color. When I tested speed, I typed the same query in mysql console 2 times , because I figured out that when you make query for the 1st time, sometimes query for not indexed column can be even a bit faster, than for indexed one. Question is: why this type of query for 200000 elements takes sometimes less time, than the same query for 100000 elements when column number is indexed. You can see that there are other unpredictable for me results. I ask this, because when column number is not indexed, the results are quite predictable: 200000 el time is always bigger than 100000. Please tell me what I'm doing wrong when trying to make research about UNIQUE indexed column.

在未索引的情况下,它始终是全表扫描,因此时间与行号很好地吻合,如果它被索引,您正在测量索引查找时间,这在您的情况下是恒定的(小数字,小偏差)

It is not the "worst" case.

  • Make the UNIQUE key random instead of being in lock step with the PK. An example of such is UUID() .
  • Generate enough rows so that the table and index(es) cannot fit in the buffer_pool.

If you both of those you will eventually see the performance slow down significantly.

UNIQUE keys have the following impact on INSERTs : The uniqueness constraint is checked before returning to the client. For a non-UNIQUE index, the work to insert into the index's BTree can (and is) delayed. (cf "Change buffer). With no index on the second column, there is even less work to do.

WHERE number=2002 --

  • With UNIQUE(number) -- Drill down the BTree. Very fast, very efficient.
  • With INDEX(number) -- Drill down the BTree. Very fast, very efficient. However it is slightly slower since it can't assume there is only one such row. That is, after finding the right spot in the BTree, it will scan forward (very efficient) until it finds a value other than 2002.
  • With no index on number -- Scan the entire table. So the cost depends on table size, not the value of number . It has no clue if 2002 exists anywhere in the table, or how many times. If you plot the times you got, you will see that it is rather linear.

I suggest you use log-log 'paper' for your graph. Anyway, note how linear the non-indexed case is. And the indexed case is essentially constant. Finding number=200002 is just as cheap as finding number=2002. This applies for UNIQUE and INDEX . (Actually, there is a very slight rise in the line because a BTree is really O(log n), not O(1). For 2K rows, there are probably 2 levels in the BTree; for 200K, 3 levels.)

The Query cache can trip you up in timings (if it is turned on). When timing, do SELECT SQL_NO_CACHE ... to avoid the QC. If the QC is on and applies, then the second and subsequent runs of the identical query will take very close to 0.000 seconds.

Those timings that varied between 0.5ms and 1.2ms -- chalk it up to the phase of the moon. Seriously, any timing below 10ms should not be trusted. This is because of all the other things that may be happening on the computer at the same time. You can temper it somewhat by averaging multiple runs -- being sure to avoid (1) the Query cache, and (2) I/O.

As for I/O... This gets back to my earlier comment about what may happen when the table (and/or index) is bigger than can be cached in RAM.

  • When smaller than RAM, the first run is likely to fetch stuff from disk. The second and subsequent runs are likely to be faster and consistent.
  • Whem bigger than RAM, all runs may need to hit the disk. Hence, all may be slow, and perhaps more flaky than the variations you found.

Your tags are, technically, incorrect. Most of MySQL's indexes are BTrees (actually B+Trees), not Binary Trees. (Sure, there is a lot of similarity, and many of the principles are shared.)

Back to your research goal.

  • Assume there is 'background noise' that is messing with your figures.
  • Make your tests non-trivial (eg the non-indexed case) so that it overwhelms the noise, or
  • Repeat the timings to mask the issue. And be sure to ignore the first run.

The main cost in performing any SELECT is how many rows it touches.

  • With your UNIQUE index, it is touching 1 row. So expect fast and O(1) (plus noise).
  • Without an index, it is touching N rows for an N-row table. So expect O(N).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM