简体   繁体   中英

MySQL: Joins vs. Bitwise operator, and performance thereof

There are a number of questions about this subject, but mine is more specific to performance concerns.

With regards to an object, I want to track a multitude of 'attributes', each with a multitude of discrete 'values' (each attribute have between 3 and 16 valid 'values'.) For instance, consider tracking military personnel. The attributes/values might be (not real, I totally made these up):

attribute: {values}
languages_spoken: {english, spanish, russian, chinese, …. }
certificates: {infantry, airborne, pilot, tank_driver…..}
approved_equipment: {m4, rocket_launcher, shovel, super_secret_radio_thingy….}
approved_operations: {reconnaissance, logistics, invasion, cooking, ….}
awards_won: {medal_honor, purple_heart, ….}

… and so on.

One one to do this - the way I want to do this - is to have a personnel table and an attributes table:

personnel table => [id, name, rank, address…..]
personnel_attributes table => [personnel_id, attribute_id, value_id]

along with the associated attributes and values tables.

So if pesonnel_id=31415 is approved for logistics, there would be the following entry in the personnel_attributes table:

personnel_id | attribute_id | value_id
31415 | 3 | 2

where 3 = attribute_id for "approved_operations" and 2 = value_id for "logistics" (sorry formatting spaces didn't line up.)

Then a search to find all personnel who speak english OR spanish, AND who is infantry OR airborne, AND can operate a shovel OR super_secret_radio_thingy would be something like:

SELECT t1.personnel_id FROM personnel_attributes t1, personnel_attributes t2, personnel_attributes t3
WHERE ((t1.attribute_id = 1 and t1.value_id = 1) OR (t1.attribute_id = 1 and t1.value_id = 2))
AND ((t2.attribute_id = 2 and t1.value_id = 1) OR (t2.attribute_id = 2 and t1.value_id = 2))
AND ((t3.attribute_id = 3 and t1.value_id = 3) OR (t3.attribute_id = 3 and t1.value_id = 4))
AND t2.personnel_id = t1.personnel_id
AND t3.personnel_id = t1.personnel_id;

Assuming this isn't a totally stupid way to write the SQL query, the problem is that its very slow (even with seemingly relevant indexes.)

So I'm am toying with using bitwise operators instead, where each attribute is a column in a table and each value is a bit. The same search would be:

SELECT personnel_id FROM personnel_attributes
WHERE language & b'00000011'
AND certificates & b'00000011'
AND approved_operations & b'00001100';

I know this does a full table scan, but in my experiments with 350,000 sample personnel, and 16 attributes each, the first method took 20 seconds whereas the bitwise method took 38 milliseconds!

Am I doing something wrong here? Are these the performance results I should expect?

Thanks!

Have the same issue of using django-bitfield or a separate table for flags.

Inspired by your experiment, I used a 3.5m record table (innodb) and made count() and retrieve queries for both variants. the result was astonishing: approx 5sec vs. 40sec bitfield wins.

Using the bitwise operation will require evaluating all of the rows. I believe your problem can be solved with a change to your original SELECT statement and how you're joing your tables:

To make it a little easier to read, I've changed attribute values to words instead of integers so it's less confusing while reading through my example, but obviously you can leave them as integers and it concept would still work:

CREATE TABLE PERSONNEL (
    ID INT,
    NAME VARCHAR(20)
)

CREATE TABLE PERSONNEL_ATTRIBUTES (
    PERSONNEL_ID INT,
    ATTRIB_ID INT,
    ATTRIB_VALUE VARCHAR(20)
)

INSERT INTO PERSONNEL VALUES (1, 'JIM SMITH')
INSERT INTO PERSONNEL VALUES (2, 'JANE DOE')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Spanish')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 1, 'Russian')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Logistics')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (1, 3, 'Infantry')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 1, 'English')
INSERT INTO PERSONNEL_ATTRIBUTES VALUES (2, 3, 'Infantry')

SELECT P.ID, P.NAME, PA1.ATTRIB_VALUE AS DESIRED_LANGUAGE, PA2.ATTRIB_VALUE AS APPROVED_OPERATION
FROM PERSONNEL P
JOIN PERSONNEL_ATTRIBUTES PA1 ON P.ID = PA1.PERSONNEL_ID AND PA1.ATTRIB_ID = 1
JOIN PERSONNEL_ATTRIBUTES PA2 ON P.ID = PA2.PERSONNEL_ID AND PA2.ATTRIB_ID = 3
WHERE PA1.ATTRIB_VALUE = 'Spanish' AND (PA2.ATTRIB_VALUE = 'Infantry' OR PA2.ATTRIB_VALUE = 'Airborne')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM