简体   繁体   English

数据库索引和查找与“最近邻居”不完全匹配

[英]Database indexing and lookup with “closest neighbour” not exact match

I'm dealing with an interesting issue. 我正在处理一个有趣的问题。

I have biometric system that uses John Daugman's algorithm to transform human irises into binary code (for some research at our university). 我有一个生物识别系统,该系统使用约翰·道格曼(John Daugman)的算法将人类虹膜转换为二进制代码(供我们大学进行一些研究)。

The iris code is "flat" (it's not stored as a circle, but transformed into rectangle): 虹膜代码是“平坦的”(不存储为圆形,而是转换为矩形):

column 1 | column 2 | column 3 | ...

10011001 ...
10110111
01100010
...

Where column represents 30 bits. 其中列代表30位。 The problem is that each scan of iris has its own noise mask (eye lids, reflections...) and matches aren't 100% but at best around 96-98%. 问题在于虹膜的每次扫描都有其自己的噪声遮罩(眼睑,反射...),匹配度不是100%,但最多只能达到96-98%。

So far we are using algorithm like this (Hamming Distance matching): 到目前为止,我们正在使用像这样的算法(汉明距离匹配):

mask = mask1 & mask2;
result = (code1 ^ code2) & mask;

// ration of 1 bits allowed by mask
double difference = (double)one_bits(result)/one_bits(mask); 

The problem with that we are now building real database of irises (around 1200-1300 subject, each 3-5 iris samples and you have to count in rotation so you need to make around 10 tests for each). 问题在于我们现在正在建立真实的虹膜数据库(大约1200-1300个主体,每个3-5个虹膜样本,您必须进行轮换计数,因此您需要为每个虹膜进行大约10个测试)。 And we need to compare current sample against whole database (65 000 comparisons on 80*30 bits) which turns out to be slow. 而且,我们需要将当前样本与整个数据库进行比较(在80 * 30位上进行65 000次比较),结果证明速度很慢。

Question: is there any hash functions which reflects data structure (and changes just a bit when few bit changes) or is "error tolerant"? 问题:是否有任何散列函数可以反映数据结构(并且几经更改便会发生一点变化)还是“容错”的? We need to build fast search algorithm in the whole database (so we are looking for possible ways to index this). 我们需要在整个数据库中构建快速搜索算法(因此我们正在寻找可能的方法来对此进行索引)。

UPDATE: I guess it should be implemented by some sort of "closest neighbour" lookup, or use some sort of clustering (where similar irises would be grouped and in first round only some representatives would be checked). 更新:我猜应该通过某种“最近邻居”查找来实现,或者使用某种聚类(将相似的虹膜分组,并且在第一轮中仅检查一些代表)。

Check Locality Sensitive Hashing ( LSH ), implementations like this . 检查本地敏感哈希( LSH ),像这样的实现。

"A nilsimsa code is something like a hash, but unlike hashes, a small change in the message results in a small change in the nilsimsa code. Such a function is called a locality-sensitive hash." “ Nilsimsa代码有点像哈希,但是与哈希不同,消息中的细微变化会导致nilsimsa代码中的细微变化。这种功能称为位置敏感哈希。”

How to understand Locality Sensitive Hashing? 如何理解本地敏感哈希?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM