简体   繁体   English

唯一但现实的对象哈希码

[英]Unique, but realistic, object hash code

Ok, I am designing up a piece of software that will keep one system synced with another. 好的,我正在设计一款软件,它将使一个系统与另一个系统保持同步。 The problem is that the originating system is some legacy DB2 nightmare with me only having read-only access and tables having no timestamping capability whatsoever, meaning no way to detect which rows were changed. 问题在于,原始系统是DB2的一些噩梦,我只具有只读访问权限,而表却没有时间戳功能,这意味着无法检测到更改了哪些行。

My idea is to just load all the rows (in total I will have about 60000 rows, synced every half hour) calculating their hashes, whilst keeping <ID, hash> tuples in my integration database. 我的想法是只加载所有行(总共我将有大约60000行,每半小时同步一次)以计算它们的哈希值,同时在集成数据库中保留<ID, hash>元组。 Then change detection becomes a job of comparing hashes and updating records in destination system where hashes mismatch or tuples missing altogether. 然后,更改检测成为比较哈希并更新目标系统中哈希不匹配或元组完全丢失的记录的工作。 Forgot to mention that reading source is cheap, updating destination is expensive, its a web service with a lot of background processing, so I would avoid updating everything every time. 忘了提一下,阅读源很便宜,更新目标很昂贵,它的Web服务具有很多后台处理,因此我避免每次都更新。

Now, my problem, the c# builtin hashcode claims that its unsuitable for this purpose (equal hash does not imply equal object) and crypto hashes seem like a big overkill with 256+ bit hashes. 现在,我的问题是,c#内置哈希码声称它不适合此目的(相等的哈希并不意味着相等的对象),而加密散列似乎具有256+位散列的过大杀伤力。 I don't think more than 64bits is needed, that would give me 1 in 10 10 chance of collision given perfectly distributed hash and allow fast hash comparison on x64 arch. 我认为不需要多于64位,这给我十分之十的机会,因为散布是完全分布式的,并且可以在x64架构上进行快速散列比较。

So what should I use to generate unique hashes? 那么我应该使用什么来生成唯一的哈希呢?

Another option; 另外一个选项; calculate the hash in C# using a function like this; 使用这样的函数在C#中计算哈希值;

private readonly System.Security.Cryptography.HashAlgorithm hash = System.Security.Cryptography.SHA1.Create();

public static string CalculateSignature(IEnumerable<object> values)
{
    var sb = new StringBuilder();
    foreach (var value in values)
    {
        string valueToHash = value == null ? ">>null<<" : Convert.ToString(value, CultureInfo.InvariantCulture);
        sb.Append(valueToHash).Append(char.ConvertFromUtf32(0));
    }
    var signature = sb.ToString();
    var bytesToHash = Encoding.UTF8.GetBytes(signature);
    var hashedBytes = hash.ComputeHash(bytesToHash);
    signature = Encoding.UTF8.GetString(hashedBytes);

    return signature;
}

Edit: Hashing profiling test 编辑:哈希分析测试

To show how fast SHA1 hashing is, here's a quick test. 为了显示SHA1哈希的速度如何,这是一个快速测试。 On my dev machine, I get 60,000 hashes in 176ms. 在我的开发机上,我在176ms内得到了60,000个哈希。 MD5 takes 161 MD5需要161

var hash = System.Security.Cryptography.MD5.Create();

var stringtoHash = "3490518cvm90wg89puse5gu3tgu3v0afgmvkldfjgmvvvvvsh,9semc9petgucm9234ucv0[vhd,flhgvzemgu904vq2m0";

var sw = System.Diagnostics.Stopwatch.StartNew();
for(var i = 0; i < 60000; i++)
{
    var bytesToHash = Encoding.UTF8.GetBytes(stringtoHash);
    var hashedBytes = hash.ComputeHash(bytesToHash);
    var signature = Encoding.UTF8.GetString(hashedBytes);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

In your staging SQL tables, add a 'checksum' column, using SQL's checksum function; 在您的暂存SQL表中,使用SQL的校验和函数添加“校验和”列;

Something like this; 像这样的东西;

update mysourcetable set check = checksum(id, field1, field2, field3, field4 ...) 更新mysourcetable设置check = checksum(id,field1,field2,field3,field4 ...)

Clarification 澄清

You mentioned having an integration database; 您提到有一个集成数据库。 my thought was that you would read the data from DB2 into an interim database, like SQL server, where you're already storing ID/hash pairs. 我的想法是,您应该将数据从DB2读入一个临时数据库(如SQL Server),该数据库已经存储了ID /哈希对。 If you copied all the data out of DB2, not just the IDs, then you could calculate the checksum in the integration database. 如果您将所有数据(不仅仅是ID)复制到DB2中,那么您可以在集成数据库中计算校验和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM