简体   繁体   中英

Real-time data matching algorithm

I have a set (integer) of input values and a set of expected values, eg

000033335502200008777
000033335552200007777

in this sample I have zero input, that should be ignored and groups of non-zero input:

3333
555
22
7777

input data could have

  1. different length of group (550 and 555)
  2. group shift (0055500 0555000)
  3. wrong (but close) values (8777 and 7777)

for each such group I would like to have matching ratio like this:

3333 (100%)
555  (66.67%)
22 (100%)
7777 (75%)

Important point is that I need this ratio right after the processing of each group is finished:

first ratio after 8 values
second ratio after 11 values
third ratio after 13 values
fourth ratio after 21 values

What algorithm/approach should I choose?

Thank you in advance!

Actually, there are some algorithms from computational biology and genetics that might be suited for fast numbers matching, and also in a field of sequence pattern mining.

Check " A FAST Pattern Matching Algorithm" by SS Sheik, Sumit K. Aggarwal Anindya Poddar N. Balakrishnan,‡ and K. Sekar

Also, it appears like you could benefit from algorithms that look into matching between the components of the strings.

Some well known are Smith-Waterman , and Needleman-Wunsch . For direct string matching I suggest looking into Jaro-Winkler and Monge-Elkan.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM