简体繁体 English

匹配数据的算法

[英]Algorithm for matching data

原文 2011-03-12 19:18:43 4 1 algorithm/ string/ string-matching

I have a project where I am testing a device that is very sensitive to noise (electromagnetic, radio, etc...). 我有一个项目，我正在测试一个对噪音非常敏感的设备（电磁，无线电等......）。 The device generates 5-6 bytes per second of binary data (looks like gibberish to an untrained eye) based on a give input (audio). 该设备基于给定输入（音频）生成每秒5-6字节的二进制数据（对于未经训练的眼睛看起来像胡言乱语）。

Depending on noise, sometime the device will miss characters, sometimes it will insert random characters, sometimes multiples of both. 根据噪音，设备有时会遗漏字符，有时它会插入随机字符，有时是两者的倍数。

I have written an app that gives the user an ability to see on the fly the errors that it generates (as compared to the master file [eg what the device should output in ideal conditions]). 我编写了一个应用程序，使用户能够即时查看它生成的错误（与主文件相比[例如，设备应在理想条件下输出的内容]）。 My algorithm basically takes each byte in the live data and compares it to the byte in the same position in the known master file. 我的算法基本上取实时数据中的每个字节，并将其与已知主文件中相同位置的字节进行比较。 If the bytes don't match, I have a window of 10 characters both ways from the current position, where I'll seek a match nearby. 如果字节不匹配，我在当前位置两个方向都有一个10个字符的窗口，我会在附近寻找匹配。 If that matches (plus a validation or two), I visually mark up the location in the UI and register an error. 如果匹配（加上验证或两个），我会直观地在UI中标记位置并注册错误。

This approach works reasonably well and actually, given the speed of the incoming data, works real time as well. 这种方法工作得相当好，实际上，考虑到输入数据的速度，它也可以实时工作。 However, I feel like what I am doing is not optimal and the approach would fall apart if the data would stream at higher rates. 但是，我觉得我所做的并不是最优的，如果数据以更高的速率传输，那么这种方法就会崩溃。

Are there other approaches I could take? 我可以采取其他方法吗？ Are there known algorithms for this type of thing? 是否有针对此类事物的已知算法？
I read many years ago that NASA's data collection outfit (eg ones that communicate with crafts in space and on the Moon/Mars) have had a 0.00001% loss of data despite tremendous interference in space. 我多年前读过NASA的数据收集装备（例如与太空和月球/火星上的工艺品交流的装备），尽管空间受到巨大干扰，但仍有0.00001％的数据丢失。

Any ideas? 有任何想法吗？

1 个解决方案

I presume of main interest is the signal generated by the device? 我认为主要的兴趣是设备产生的信号？ What is more important? 更重要的是什么？ Detecting when an error has occurred or making the signal 'robust' against such errors? 检测何时发生错误或使信号对此类错误“稳健”？ I do a lot of signal processing lately and denoising a signal is part of my routine, I'm basically trying to estimate the real signal and remove any contaminants. 我最近做了很多信号处理，并且对信号进行去噪是我常规的一部分，我基本上是在尝试估算真实信号并去除任何污染物。

I don't know how the signal generated by the device is further used...if it's being recorded to a computer, then you can easily apply some denoising, try wavelet denoising for instance. 我不知道设备产生的信号是如何被进一步使用的...如果它被记录到计算机上，那么你可以轻松地应用一些去噪，例如尝试小波去噪。 You will find packages for doing this in several languages of your choice. 您将找到以您选择的多种语言执行此操作的软件包。