简体   繁体   中英

Optimizing an index in a large MySQL table

I have a large table (about 3 million records) that includes primarily these fields: rowID (int), a deviceID (varchar(20)), a UnixTimestamp in a format like 1536169459 (int(10)), powerLevel which has integers that range between 30 and 90 (smallint(6)).

I'm looking to pull out records within a certain time range (using UnixTimestamp) for a particular deviceID and with a powerLevel above a certain number. With over 3 million records, it takes a while. Is there a way to create an index that will optimize for this?

Create an index over:

DeviceId,
PowerLevel,
UnixTimestamp

When selecting, you will first narrow in to the set of records for your given Device, then it will narrow in to only those records that are in the correct PowerLevel range. And lastly, it will narrow in, for each PowerLevel, to the correct records by UnixTimestamp.

If I understand you correctly, you hope to speed up this sort of query.

SELECT something
  FROM tbl
 WHERE deviceID = constant
   AND start <= UnixTimestamp
   AND UnixTimestamp < end
   AND Power >= constant

You have one constant criterion (deviceID) and two range critera (UnixTimestamp and Power). MySQL's indexes are BTREE (think sorted in order), and MySQL can only do one index range scan per SELECT.

So, you should probably choose an index on (deviceID, UnixTimestamp, Power) . To satisfy the query, MySQL will random-access the index to the entries for deviceID, then further random access to the first row meeting the UnixTimestamp start criterion.

It will then scan the index sequentially, and use the Power information from each index entry to decide whether it should choose each row.

You could also use (deviceID, Power, UnixTimestamp) . But in this case MySQL will find the first entry matching the device and power criteria, then scan the index to look at entries will all timestamps to see which rows it should choose.

Your performance objective is to get MySQL to scan the fewest possible index entries, so it seems very likely the (deviceID, UnixTimestamp, Power) choice is superior. The index column on UnixTimestamp is probably more selective than the one on Power. (That's my guess.)

ALTER TABLE tbl CREATE INDEX tbl_dev_ts_pwr (deviceID, UnixTimestamp, Power);

Look at Bill Karwin's tutorials. Also look at Markus Winand's https://use-the-index-luke.com

The suggested 3-column indexes are only partially useful. The Optimizer will use the first 2 columns, but ignore the third.

Better:

INDEX(DeviceId, PowerLevel),
INDEX(DeviceId, UnixTimestamp)

Why?

The optimizer will pick between those two based on which seems to be more selective. If the time range is 'narrow', then the second index will be used; if there are not many rows with the desired PowerLevel, then the first index will be used.

Even better...

The PRIMARY KEY ... You probably have Id as the PK? Perhaps (DeviceId, UnixTimestamp) is unique? (Or can you have two readings for a single device in a single second??) If the pair is unique, get rid of Id completely and have

PRIMARY KEY(DeviceId, UnixTimestamp),
INDEX(DeviceId, PowerLevel)

Notes:

  • Getting rid of Id saves space, thereby providing a little bit of speed.
  • When using a secondary index, the executing spends time bouncing between the index's BTree and the data BTree (ordered by the PK). By having PRIMARY KEY(Id) , you are guaranteed to do the bouncing. By changing the PK to this, the bouncing is avoided. This may double the speed of the query.
  • (I am not sure the secondary index will every be used.)

Another (minor) suggestion: Normalize the DeviceId so that it is (perhaps) a 2-byte SMALLINT UNSIGNED (range 0..64K) instead of VARCHAR(20) . Even if this entails a JOIN , the query will run a little faster. And a bunch of space is saved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM