简体   繁体   中英

How can I calculate the difference between two hashes in a MySQL query?

I'm attempting to calculate the Hamming distance between an input hash and database-stored hashes. These are perceptual hashes, so the Hamming distance between them are important to me and tell me how similar two different images are (see http://en.wikipedia.org/wiki/Perceptual_hashing , http://jenssegers.com/61/perceptual-image-hashes , http://stackoverflow.com/questions/21037578/ ). Hashes are 16 hexadecimal characters long, and look like this:

b1d0c44a4eb5b5a9
1f69f25228ed4a31
751a0b19f0c2783f

My database looks like this:

CREATE TABLE `hashes` (
  `id` int(11) NOT NULL,
  `hash` binary(8) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;

INSERT INTO `hashes` (`id`, `hash`) VALUES
    (1, 0xb1d0c44a4eb5b5a9),
    (2, 0x1f69f25228ed4a31),
    (3, 0x751a0b19f0c2783f);

Now, I know I can query for a Hamming distance like so:

SELECT BIT_COUNT(0xb1d0c44a4eb5b5a9 ^ 0x751a0b19f0c2783f)

Which will output 38, as expected. However, I can't seem to reference a column name for this comparison. The following does not work as expected.

SELECT BIT_COUNT(hash ^ 0x751a0b19f0c2783f) FROM hashes

Does anyone know how I can calculate a Hamming distance like in my first SELECT query above using the columns in my database? I've tried a myriad of scenarios using hex() , unhex() , conv() , and cast() in different ways. This is in MySQL.

Update My query above appears to work as expected when running in MySQL v8 (thanks to @LukStorms for pointing this out). You can use my fiddle below and change the version in the top left. My question now is: how can I ensure the behavior works in all versions of MySQL?

Fiddle: https://www.db-fiddle.com/f/mpqsUpZ1sv2kmvRwJrK5xL/0

The problem seems to be related to your choice of datatype which is a string type. Using a numeric datatype works in MySQL 5.7 as well as 8.0:

CREATE TABLE `hashes` (
  `id` int(11) NOT NULL,
  `hash` bigint unsigned NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;

INSERT INTO `hashes` (`id`, `hash`) VALUES
    (1, 0xb1d0c44a4eb5b5a9),
    (2, 0x1f69f25228ed4a31),
    (3, 0x751a0b19f0c2783f);

SELECT id, HEX(hash), BIT_COUNT(hash ^ 0x751a0b19f0c2783f)
FROM hashes;

Output:

id  HEX(hash)           BIT_COUNT(hash ^ 0x751a0b19f0c2783f)
1   B1D0C44A4EB5B5A9    38
2   1F69F25228ED4A31    34
3   751A0B19F0C2783F    0

Demo on dbfiddle

The difference in treatment between MySQL 5.7 and 8.0 of using a string type can be seen with this query:

SELECT id, hash, HEX(hash), HEX(hash ^ 0x751a0b19f0c2783f)
FROM hashes;

MySQL 5.7:

id  hash                                                        HEX(hash)           HEX(hash ^ 0x751a0b19f0c2783f)
1   {"type":"Buffer","data":[177,208,196,74,78,181,181,169]}    B1D0C44A4EB5B5A9    751A0B19F0C2783F
2   {"type":"Buffer","data":[31,105,242,82,40,237,74,49]}       1F69F25228ED4A31    751A0B19F0C2783F
3   {"type":"Buffer","data":[117,26,11,25,240,194,120,63]}      751A0B19F0C2783F    751A0B19F0C2783F

MySQL 8.0

id  hash                                                        HEX(hash)           HEX(hash ^ 0x751a0b19f0c2783f)
1   {"type":"Buffer","data":[177,208,196,74,78,181,181,169]}    B1D0C44A4EB5B5A9    C4CACF53BE77CD96
2   {"type":"Buffer","data":[31,105,242,82,40,237,74,49]}       1F69F25228ED4A31    6A73F94BD82F320E
3   {"type":"Buffer","data":[117,26,11,25,240,194,120,63]}      751A0B19F0C2783F    0000000000000000

MySQL 8.0 is performing the XOR correctly, returning a variable, while MySQL 5.7 is returning the value being XOR'ed, indicating that it is treating the BINARY string as 0 in a numeric context.

This is not a number, so it can't used for mathematic calculations:

`hash` binary(8) NOT NULL

Use bigint instead:

`hash` bigint unsigned NOT NULL

尝试这个:

SELECT id, HEX(hash), CAST(CONV(HEX(hash),16,10) AS UNSIGNED), BIT_COUNT(CAST(CONV(HEX(hash),16,10) AS UNSIGNED) ^ 0x751a0b19f0c2783f) FROM hashes;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM