生成可验证的随机数 - Java

Question

I am trying to validate a properietery database (actually, a file system, but for this discussion, I want to keep this simple). 我正在尝试验证一个专业数据库（实际上，一个文件系统，但对于这个讨论，我想保持这个简单）。 The database has the following properties: 该数据库具有以下属性：

It can have either 1 or 2 primary keys, and they MUST be integers. 它可以有1个或2个主键，它们必须是整数。 Columns could be string (non-ascii permitted), integer, long, or datetime 列可以是字符串（非ascii允许），整数，长整数或日期时间

I want to validate that the values I ask this database to store are correctly stored with a large number of records (> 500k records). 我想验证我要求此数据库存储的值是否正确存储了大量记录（> 500k记录）。 So for this, I want to extend a tool that generates data that I can easily validate later. 因此，我想扩展一个生成数据的工具，以后我可以轻松验证。

So basically, say this is the sample schema: 所以基本上，这是示例模式：

pk1 (int - primary key)
pk2 (int - primary key)
s1 (string)
l1 (long)
i1 (int)

I want to generate 500k records with this tool. 我想用这个工具生成500k记录。 Then, at any given time, I want to be able to sanity check a given record. 然后，在任何给定的时间，我希望能够理智地检查给定的记录。 I might perform a series of operations (say backup, then restore the database), and then "spot check" few records. 我可能会执行一系列操作（比如备份，然后恢复数据库），然后“抽查”几条记录。 So I want to be able to quickly validate that the entry for record for primary key (pk1 = 100, pk2 = 1) is valid. 所以我希望能够快速验证主键（pk1 = 100，pk2 = 1）的记录条目是否有效。

What is the best way to go about generating the values for each column such that it can be easily validated later. 为每列生成值的最佳方法是什么，以便以后可以轻松验证。 The values need not be fully random, but they should not repeat frequently either, so some of the compression logic could be hit too. 值不一定是完全随机的，但它们也不应经常重复，因此一些压缩逻辑也可能被击中。

As an example, say "somehow" the tool generated the following value for a row: 例如，假设“某种程度上”该工具为行生成了以下值：

pk1 = 1000
pk2 = 1
s1 = "foobar"
l1 = 12345
i1 = 17

Now I perform several operations, and I want to validate that at the end of this, this row has not corrupted. 现在我执行了几个操作，我想验证在这一行结束时，这一行没有被破坏。 I have to be able to quickly generate expected values for s1, l1, and i1 - given pk1=1000 and pk2=1 - so it can be validated really quickly. 我必须能够快速生成s1，l1和i1的预期值 - 给定pk1 = 1000和pk2 = 1 - 因此可以非常快速地验证它。

Ideas? 想法？

(I can't post answer to my own question since I am a new used, so adding this:) Ok, so I have to possible approaches I could pursue: （我不能回答我自己的问题，因为我是一个新用的，所以添加这个:)好的，所以我必须采取可能的方法：

Approach# 1: use HASH(tablename) ^ HASH(fieldname) ^ pk1 ^ pk2 as the seed. 方法＃1：使用HASH（tablename）^ HASH（fieldname）^ pk1 ^ pk2作为种子。 This way, I can easily compute the seed for each column when validating. 这样，我可以在验证时轻松计算每列的种子。 On the flip side, this could be expensive when generating data for lots of rows since the seed need to computed once per column. 另一方面，在为大量行生成数据时，这可能很昂贵，因为种子需要每列计算一次。 So for the above schema, I would have 500k*3 seeds (to generate 500k records). 所以对于上面的模式，我会有500k * 3种子（生成500k记录）。

Approach# 2 (Proposed by Philipp Wendler): Generate one seed per row, and store the seed in the first column of that row. 方法＃2（由Philipp Wendler提出）：每行生成一个种子，并将种子存储在该行的第一列中。 If the first column is an int or long, store the value as-is. 如果第一列是int或long，则按原样存储该值。 If the first column is a string, store the seed in the first x bytes, and then pad it upto the required string length with characters generated using that seed. 如果第一列是字符串，则将种子存储在前x个字节中，然后将其填充到所需的字符串长度，并使用该种子生成字符。

I like approach #2 better because there is just one seed per row - making the data generation somewhat faster than approach #1. 我更喜欢方法＃2，因为每行只有一个种子 - 使数据生成比方法＃1快一些。

Answer 1

You could just generate arbitrary random data, calculate an hash code (MD5 for example, as it doesn't need to be cryptographically secure) and store the hash code with your data. 您可以生成任意随机数据，计算哈希代码（例如，MD5，因为它不需要加密安全）并将哈希代码与您的数据一起存储。 You can have a separate column for the hash code, or for example you can append it to any string column. 您可以为哈希代码设置单独的列，例如，您可以将其附加到任何字符串列。

For verifying, separate the stored hash code from the rest of the data in that row, re-calculate the hash code and compare them for equality. 为了验证，将存储的哈希代码与该行中的其余数据分开，重新计算哈希代码并将它们进行相等性比较。 If they don't match, your data was modified. 如果它们不匹配，则会修改您的数据。

This assumes that you want to protect you data only from accidental modifications (not from a malicious attacker). 这假设您只想保护数据免受意外修改（而不是来自恶意攻击者）。

Answer 2

也许来自apache commons的东西可能是解决方案

Answer 3

This answers only second part of your question - what about making l1 storing hash of all the other fields? 这只回答了你问题的第二部分 - 如何让l1存储所有其他字段的哈希值？ Then you can quickly verify if anything is corrupted 然后，您可以快速验证是否有任何损坏

生成可验证的随机数 - Java

问题描述

3 个解决方案

解决方案1
1 2012-02-07 18:54:48

解决方案2
0 2012-02-07 18:48:14

解决方案3
0 2012-02-07 18:54:34

生成可验证的随机数 - Java

问题描述

3 个解决方案

解决方案1 1 2012-02-07 18:54:48

解决方案2 0 2012-02-07 18:48:14

解决方案3 0 2012-02-07 18:54:34

解决方案1
1 2012-02-07 18:54:48

解决方案2
0 2012-02-07 18:48:14

解决方案3
0 2012-02-07 18:54:34