简体   繁体   English

如何生成随机 SHA1 哈希以用作 node.js 中的 ID?

[英]How to generate random SHA1 hash to use as ID in node.js?

I am using this line to generate a sha1 id for node.js:我正在使用这一行为 node.js 生成一个 sha1 id:

crypto.createHash('sha1').digest('hex');

The problem is that it's returning the same id every time.问题是它每次都返回相同的 id。

Is it possible to have it generate a random id each time so I can use it as a database document id?是否可以让它每次生成一个随机 ID,以便我可以将其用作数据库文档 ID?

243,583,606,221,817,150,598,111,409x more entropy 243,583,606,221,817,150,598,111,409 倍的熵

I'd recommend using crypto.randomBytes .我建议使用crypto.randomBytes It's not sha1 , but for id purposes, it's quicker, and just as "random".它不是sha1 ,但出于 id 的目的,它更快,并且就像“随机”一样。

var id = crypto.randomBytes(20).toString('hex');
//=> f26d60305dae929ef8640a75e70dd78ab809cfe9

The resulting string will be twice as long as the random bytes you generate;结果字符串将是您生成的随机字节的两倍; each byte encoded to hex is 2 characters.编码为十六进制的每个字节是 2 个字符。 20 bytes will be 40 characters of hex. 20 个字节将是 40 个十六进制字符。

Using 20 bytes, we have 256^20 or 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 unique output values.使用 20 个字节,我们有256^201,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 个唯一输出值。 This is identical to SHA1's 160-bit (20-byte) possible outputs.SHA1 的 160 位(20 字节)可能输出相同。

Knowing this, it's not really meaningful for us to shasum our random bytes.知道了这一点,对我们来说,对我们的随机字节进行shasum并没有真正的意义。 It's like rolling a die twice but only accepting the second roll;这就像掷骰子两次但只接受第二次掷骰; no matter what, you have 6 possible outcomes each roll, so the first roll is sufficient.无论如何,每卷都有 6 个可能的结果,所以第一卷就足够了。


Why is this better?为什么这样更好?

To understand why this is better, we first have to understand how hashing functions work.要理解为什么这样做更好,我们首先必须了解散列函数是如何工作的。 Hashing functions (including SHA1) will always generate the same output if the same input is given.如果给定相同的输入,散列函数(包括 SHA1)将始终生成相同的输出。

Say we want to generate IDs but our random input is generated by a coin toss.假设我们想要生成 ID,但我们的随机输入是通过抛硬币生成的。 We have "heads" or "tails"我们有"heads""tails"

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

% echo -n "tails" | shasum
71ac9eed6a76a285ae035fe84a251d56ae9485a4  -

If "heads" comes up again, the SHA1 output will be the same as it was the first time如果"heads"再次出现,SHA1 输出将与第一次相同

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

Ok, so a coin toss is not a great random ID generator because we only have 2 possible outputs.好的,所以抛硬币不是一个很好的随机 ID 生成器,因为我们只有 2 个可能的输出。

If we use a standard 6-sided die, we have 6 possible inputs.如果我们使用标准的 6 面模具,我们有 6 个可能的输入。 Guess how many possible SHA1 outputs?猜猜有多少可能的 SHA1 输出? 6! 6!

input => (sha1) => output
1 => 356a192b7913b04c54574d18c28d46e6395428ab
2 => da4b9237bacccdf19c0760cab7aec4a8359010b0
3 => 77de68daecd823babbb58edb1c8e14d7106e83bb
4 => 1b6453892473a467d07372d45eb05abc2031647a
5 => ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
6 => c1dfd96eea8cc2b62785275bca38ac261256e278

It's easy to delude ourselves by thinking just because the output of our function looks very random, that it is very random.仅仅因为我们函数的输出看起来非常随机,就很容易自欺欺人,认为它非常随机的。

We both agree that a coin toss or a 6-sided die would make a bad random id generator, because our possible SHA1 results (the value we use for the ID) are very few.我们都同意抛硬币或 6 面骰子会产生不好的随机 id 生成器,因为我们可能的 SHA1 结果(我们用于 ID 的值)很少。 But what if we use something that has a lot more outputs?但是,如果我们使用具有更多输出的东西呢? Like a timestamp with milliseconds?像毫秒的时间戳? Or JavaScript's Math.random ?还是 JavaScript 的Math.random Or even a combination of those two?!或者甚至是这两者的结合?!

Let's compute just how many unique ids we would get ...让我们计算一下我们将获得多少个唯一 ID...


The uniqueness of a timestamp with milliseconds以毫秒为单位的时间戳的唯一性

When using (new Date()).valueOf().toString() , you're getting a 13-character number (eg, 1375369309741 ).使用(new Date()).valueOf().toString()时,您将获得一个 13 个字符的数字(例如1375369309741 )。 However, since this a sequentially updating number (once per millisecond), the outputs are almost always the same.但是,由于这是一个顺序更新的数字(每毫秒一次),因此输出几乎总是相同的。 Let's take a look让我们来看看

for (var i=0; i<10; i++) {
  console.log((new Date()).valueOf().toString());
}
console.log("OMG so not random");

// 1375369431838
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431840
// 1375369431840
// OMG so not random

To be fair, for comparison purposes, in a given minute (a generous operation execution time), you will have 60*1000 or 60000 uniques.公平地说,出于比较的目的,在给定的分钟内(大量的操作执行时间),您将拥有60*100060000个唯一身份。


The uniqueness of Math.random Math.random的唯一性

Now, when using Math.random , because of the way JavaScript represents 64-bit floating point numbers, you'll get a number with length anywhere between 13 and 24 characters long.现在,当使用Math.random时,由于 JavaScript 表示 64 位浮点数的方式,您将得到一个长度介于 13 到 24 个字符之间的数字。 A longer result means more digits which means more entropy.更长的结果意味着更多的数字,这意味着更多的熵。 First, we need to find out which is the most probable length.首先,我们需要找出最可能的长度。

The script below will determine which length is most probable.下面的脚本将确定最可能的长度。 We do this by generating 1 million random numbers and incrementing a counter based on the .length of each number.我们通过生成 100 万个随机数并根据每个数字的.length递增一个计数器来做到这一点。

// get distribution
var counts = [], rand, len;
for (var i=0; i<1000000; i++) {
  rand = Math.random();
  len  = String(rand).length;
  if (counts[len] === undefined) counts[len] = 0;
  counts[len] += 1;
}

// calculate % frequency
var freq = counts.map(function(n) { return n/1000000 *100 });

By dividing each counter by 1 million, we get the probability of the length of number returned from Math.random .通过将每个计数器除以 100 万,我们得到从Math.random返回的数字长度的概率。

len   frequency(%)
------------------
13    0.0004  
14    0.0066  
15    0.0654  
16    0.6768  
17    6.6703  
18    61.133  <- highest probability
19    28.089  <- second highest probability
20    3.0287  
21    0.2989  
22    0.0262
23    0.0040
24    0.0004

So, even though it's not entirely true, let's be generous and say you get a 19-character-long random output;所以,即使它不完全正确,让我们大方地说你得到一个 19 个字符长的随机输出; 0.1234567890123456789 . 0.1234567890123456789 The first characters will always be 0 and .第一个字符将始终为0. , so really we're only getting 17 random characters. ,所以实际上我们只得到 17 个随机字符。 This leaves us with 10^17 +1 (for possible 0 ; see notes below) or 100,000,000,000,000,001 uniques.这给我们留下了10^17 +1 (可能为0 ;请参阅下面的注释)或100,000,000,000,000,001个唯一身份。


So how many random inputs can we generate?那么我们可以生成多少个随机输入呢?

Ok, we calculated the number of results for a millisecond timestamp and Math.random好的,我们计算了毫秒时间戳和Math.random的结果数

      100,000,000,000,000,001 (Math.random)
*                      60,000 (timestamp)
-----------------------------
6,000,000,000,000,000,060,000

That's a single 6,000,000,000,000,000,060,000-sided die.那是一个 6,000,000,000,000,000,060,000 面的模具。 Or, to make this number more humanly digestible, this is roughly the same number as或者,为了使这个数字更容易被人类消化,这个数字与

input                                            outputs
------------------------------------------------------------------------------
( 1×) 6,000,000,000,000,000,060,000-sided die    6,000,000,000,000,000,060,000
(28×) 6-sided die                                6,140,942,214,464,815,497,21
(72×) 2-sided coins                              4,722,366,482,869,645,213,696

Sounds pretty good, right ?听起来不错,对吧? Well, let's find out ...好吧,让我们找出...

SHA1 produces a 20-byte value, with a possible 256^20 outcomes. SHA1产生一个 20 字节的值,可能有 256^20 个结果。 So we're really not using SHA1 to it's full potential.所以我们真的没有使用 SHA1 来充分发挥它的潜力。 Well how much are we using?那么我们使用了多少?

node> 6000000000000000060000 / Math.pow(256,20) * 100

A millisecond timestamp and Math.random uses only 4.11e-27 percent of SHA1's 160-bit potential!毫秒时间戳和 Math.random 仅使用 SHA1 的 160 位潜力的 4.11e-27%!

generator               sha1 potential used
-----------------------------------------------------------------------------
crypto.randomBytes(20)  100%
Date() + Math.random()    0.00000000000000000000000000411%
6-sided die               0.000000000000000000000000000000000000000000000411%
A coin                    0.000000000000000000000000000000000000000000000137%

Holy cats, man!圣猫,伙计! Look at all those zeroes.看看所有这些零。 So how much better is crypto.randomBytes(20) ?那么crypto.randomBytes(20)好多少呢? 243,583,606,221,817,150,598,111,409 times better. 243,583,606,221,817,150,598,111,409倍。


Notes about the +1 and frequency of zeroes关于+1和零频率的注释

If you're wondering about the +1 , it's possible for Math.random to return a 0 which means there's 1 more possible unique result we have to account for.如果您对+1感到疑惑, Math.random可能会返回0 ,这意味着我们必须考虑另外 1 个可能的唯一结果。

Based on the discussion that happened below, I was curious about the frequency a 0 would come up.根据下面发生的讨论,我很好奇出现0的频率。 Here's a little script, random_zero.js , I made to get some data这是一个小脚本, random_zero.js ,我用来获取一些数据

#!/usr/bin/env node
var count = 0;
while (Math.random() !== 0) count++;
console.log(count);

Then, I ran it in 4 threads (I have a 4-core processor), appending the output to a file然后,我在 4 个线程中运行它(我有一个 4 核处理器),将输出附加到一个文件中

$ yes | xargs -n 1 -P 4 node random_zero.js >> zeroes.txt

So it turns out that a 0 is not that hard to get.所以事实证明, 0并不难获得。 After 100 values were recorded, the average was记录100 个值后,平均值为

1 in 3,164,854,823 randoms is a 0 3,164,854,823个随机数中有 1 个是 0

Cool!凉爽的! More research would be required to know if that number is on-par with a uniform distribution of v8's Math.random implementation需要进行更多研究才能知道该数字是否与 v8 的Math.random实现的均匀分布相当

Have a look here: How do I use node.js Crypto to create a HMAC-SHA1 hash?看看这里: 如何使用 node.js Crypto 创建 HMAC-SHA1 哈希? I'd create a hash of the current timestamp + a random number to ensure hash uniqueness:我会创建当前时间戳的哈希 + 一个随机数以确保哈希唯一性:

var current_date = (new Date()).valueOf().toString();
var random = Math.random().toString();
crypto.createHash('sha1').update(current_date + random).digest('hex');

Do it in the browser, too !在浏览器中也可以!

EDIT: this didn't really fit into the flow of my previous answer.编辑:这并不真正适合我之前回答的流程。 I'm leaving it here as a second answer for people that might be looking to do this in the browser.对于可能希望在浏览器中执行此操作的人,我将其留在这里作为第二个答案。

You can do this client side in modern browsers, if you'd like如果您愿意,可以在现代浏览器中执行此客户端

 // str byteToHex(uint8 byte) // converts a single byte to a hex string function byteToHex(byte) { return ('0' + byte.toString(16)).slice(-2); } // str generateId(int len); // len - must be an even number (default: 40) function generateId(len = 40) { var arr = new Uint8Array(len / 2); window.crypto.getRandomValues(arr); return Array.from(arr, byteToHex).join(""); } console.log(generateId()) // "1e6ef8d5c851a3b5c5ad78f96dd086e4a77da800" console.log(generateId(20)) // "d2180620d8f781178840"

Browser requirements浏览器要求

Browser    Minimum Version
--------------------------
Chrome     11.0
Firefox    21.0
IE         11.0
Opera      15.0
Safari     5.1

If Want To Get Unique Identifiers, You should use UUID (Universally Unique Identifier) / GUID (Globally Unique Identifier).如果想要获取唯一标识符,您应该使用 UUID(通用唯一标识符)/GUID(全局唯一标识符)。

A Hash is Supposed to be Deterministic & Unique & of Fixed Length For Input of any size.对于任何大小的输入,散列应该是确定性的、唯一的和固定长度的。 So no matter how many times you run the hash function, the output will be the same if you use the same input.因此,无论您运行多少次哈希函数,如果您使用相同的输入,输出将是相同的。

UUIDs Are Unique & Randomly Generated! UUID 是唯一且随机生成的! There Is A Package called 'uuid' you can install it via npm by有一个名为“uuid”的包,你可以通过 npm 安装它

npm install uuid npm 安装 uuid

& In your code import the module by & 在您的代码中导入模块

const { v4:uuidv4} = require('uuid');常量 { v4:uuidv4} = 要求('uuid');

// Call The Method uuidv4 or whatever you name it while importing & log it or store it or assign it. // 调用方法 uuidv4 或任何您在导入和记录或存储或分配它时命名的方法。 The method return a UUID in the form of a string.该方法以字符串的形式返回 UUID。

console.log(uuidv4());控制台.log(uuidv4()); // Example Output : '59594fc8-6a35-4f50-a966-4d735d8402ea' // 示例输出:'59594fc8-6a35-4f50-a966-4d735d8402ea'

Here is the npm link (if you need it) : https://www.npmjs.com/package/uuid这是 npm 链接(如果需要): https ://www.npmjs.com/package/uuid

Using crypto is a good approach cause it's native and stable module, but there are cases where you can use bcrypt if you want to create a really strong and secure hash.使用crypto是一种很好的方法,因为它是本机且稳定的模块,但是如果您想创建一个非常强大且安全的哈希,则可以在某些情况下使用bcrypt I use it for passwords it has a lot of techniques for hashing, creating salt and comparing passwords.我将它用于密码,它有很多用于散列、创建盐和比较密码的技术。

Technique 1 (generate a salt and hash on separate function calls)技术 1(在单独的函数调用上生成盐和散列)

const salt = bcrypt.genSaltSync(saltRounds);
const hash = bcrypt.hashSync(myPlaintextPassword, salt);

Technique 2 (auto-gen a salt and hash):技术 2(自动生成盐和哈希):

const hash = bcrypt.hashSync(myPlaintextPassword, saltRounds);

For more examples you can check here: https://www.npmjs.com/package/bcrypt有关更多示例,您可以在此处查看: https ://www.npmjs.com/package/bcrypt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM