简体   繁体   English

散列大于或等于56个字节的数据

[英]Hashing Data greater than or equal to 56 bytes

I'm trying to understand the internal mechanism used in the SHA-1 hash function. 我试图了解SHA-1哈希函数中使用的内部机制。 I'm also referring to the FIPS-180 standard. 我也指的是FIPS-180标准。

Managed to write an implementation is able to return accurate results hashing a string "abc". 设法编写一个实现,能够对哈希字符串“ abc”返回正确的结果。 However I'm still blunt how to interpret strings >= 56 bytes. 但是我仍然直言不讳如何解释> = 56字节的字符串。 The FIPS-180 standard specifies to use 1024bit for a string size of 56 bytes. FIPS-180标准指定将1024位用于56字节的字符串。

Sha1 operates on messages with block size of 512bits (64 bytes). Sha1对块大小为512位(64字节)的消息进行操作。

What happens if you have a message of length 104 bytes? 如果您收到一条长度为104字节的消息,该怎么办? First you need to pad the message in order to be able to operate on blocks of size 512bits. 首先,您需要填充消息,以便能够在512位大小的块上运行。

You take the incomplete last block of 104 - 64 = 40 bytes, and go through the message padding phase that is described in fips-180 in order to get a block size of 512bits (64bytes), and perform the message digest calculation. 您采用104-64 = 40字节的不完整最后一个块,并经过fips-180中描述的消息填充阶段,以获取512位(64字节)的块大小,并执行消息摘要计算。

The padding phase (taken from wiki ) is: 填充阶段(来自wiki )是:

Pre-processing:
append the bit '1' to the message
append 0 ≤ k < 512 bits '0', so that the resulting message length (in bits) is congruent to 448 ≡ −64 (mod 512)
append length of message (before pre-processing), in bits, as 64-bit big-endian integer

In case you care to look through it, here's some code I wrote as kind of a "reference implementation" a few years ago -- it's intended more to map reasonably closely to the standards, than to be fast, efficient, etc. 以防万一,您可以看一下几年前我作为“参考实现”编写的一些代码,它旨在更合理地映射到标准,而不是快速,高效等。

sha1.h: sha1.h:

#ifndef SHA_1_H_INCLUDED_
#define SHA_1_H_INCLUDED_

// This is a relatively straightforward implementation of SHA-1. It makes no particular
// attempt at optimization, instead aiming toward easy verification against the standard.
// To that end, many of the variable names are identical to those used in FIPS 180-2 and
// FIPS 180-3. 
//
// The code should be fairly portable, within a few limitations:
// 1. It requires that 'char' have 8 bits. In theory this is avoidable, but I don't think
// it's worth the bother.
// 2. It only deals with inputs in (8-bit) bytes. In theory, SHA-1 can deal with a number of 
// bits that's not a multiple of 8, but I've never needed it. Since the padding always results
// in a byte-sized stream, the only parts that would need changing would be reading and padding
// the input. The main hashing portion would be unaffected.
//
// Compiles cleanly with:
//    MS VC++ 9.0SP1 (x86 or x64): -W4 -Za
//    gc++ 3.4: -ansi -pedantic -Wall
//    comeau 4.3.3: --vc71 
// Appears to work corectly in all cases. 
// You can't use maximum warnings with Comeau though -- this code itself doesn't give problems
// (that I know of) but Microsoft's headers give it *major* heartburn.
// 
//
// Written by Jerry Coffin, February 2008
// 
// You can use this software any way you want to, with following limitations
// (shamelessly stolen from the Boost software license):
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
// SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
// FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
// DEALINGS IN THE SOFTWARE.
// 
// If you put this to real use, I'd be happy to hear about it. If you find a bug, 
// I'd be interested in hearing about that too. There's even a pretty good chance 
// that I'll try to fix it, though I certainly can't guarantee that.
// 
#include <algorithm>
#include <vector>
#include <string>
#include <assert.h>
#include <iostream>
#include <sstream>
#include <iomanip>

#if defined(_MSC_VER) && _MSC_VER < 1600
typedef unsigned int uint32_t;
typedef unsigned __int64 uint64_t;
#else
#include <stdint.h>
#endif

namespace crypto { 
namespace {
    struct ternary_operator { 
        virtual uint32_t operator()(uint32_t x, uint32_t y, uint32_t z) = 0;
    };
}

class sha1 { 
    static const size_t hash_size = 5;
    static const size_t min_pad = 64;
    static const size_t block_bits = 512;
    static const size_t block_bytes = block_bits / 8;
    static const size_t block_words = block_bytes / 4;

    std::vector<uint32_t> K;
    std::vector<uint32_t> H;
    std::vector<uint32_t> W;
    std::vector<ternary_operator *> fs;
    uint32_t a, b, c, d, e, T;
    static const size_t block_size = 16;
    static const size_t bytes_per_word = 4;
    size_t total_size;

    // hash a 512-bit block of input.
    //
    void hash_block(std::vector<uint32_t> const &block);

    // Pad the input to a multiple of 512 bits, and add the length
    // in binary to the end.
    static std::string pad(std::string const &input, size_t size);

    // Turn 64 bytes into a block of 16 uint32_t's.
    std::vector<uint32_t> make_block(std::string const &in);

public:

    // Construct a SHA-1 object. More expensive that typical 
    // ctor, but not expected to be copied a lot or anything
    // like that, so it should be fairly harmless.
    sha1();

    // The two ways to provide input for hashing: as a stream or a string.
    // Either way, you get the result as a vector<uint32_t>. It's a fairly
    // small vector, so even if your compiler doesn't do return-value 
    // optimization, the time it takes isn't like to be significant.
    // 
    std::vector<uint32_t> operator()(std::istream &in);
    std::vector<uint32_t> operator()(std::string const &input);

    friend std::ostream &operator<<(std::ostream &os, sha1 const &s);
};
}

#endif

And the implementation: 并执行:

// Sha1.cpp:
#include "sha.h"
// Please see comments in sha.h for licensing information, etc.
//

// Many people don't like the names I usually use for namespaces, so I've kept this one
// short and simple.
//
namespace crypto {
namespace {
//    void show(char const *caption, sha1 const &s, std::ostream &os) {
//        os << caption << s;
//    }

uint32_t ROTL(uint32_t const &value, unsigned bits) { 
    uint32_t mask = (1 << bits) - 1;
    return value << bits | (value >> (32-bits))&mask;
}

struct f1 : ternary_operator {
    uint32_t operator()(uint32_t x, uint32_t y, uint32_t z) { 
        return (x & y) ^ (~x&z);
    }
};

struct f2 : ternary_operator {
    uint32_t operator()(uint32_t x, uint32_t y, uint32_t z) {
        return x ^ y ^ z;
    }
};

struct f3 : ternary_operator {
    uint32_t operator()(uint32_t x, uint32_t y, uint32_t z) {
        return (x&y) ^ (x&z) ^ (y&z);
    }
};

uint32_t word(int a, int b, int c, int d) {
    a &= 0xff;
    b &= 0xff;
    c &= 0xff;
    d &= 0xff;
    int val =  a << 24 | b << 16 | c << 8 | d;
    return val;
}
}

// hash a 512-bit block of input.
//
void sha1::hash_block(std::vector<uint32_t> const &block) {
    assert(block.size() == block_words);

    int t;
    std::copy(block.begin(), block.end(), W.begin());
    for (t=16; t<80; t++) {
        W[t] = ROTL(W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16], 1);
    }

    a = H[0]; b = H[1]; c = H[2]; d = H[3]; e = H[4];

    for (t=0; t<80; t++) {
        T = ROTL(a, 5) + (*fs[t])(b, c, d) + e + K[t] + W[t];
        e = d;
        d = c;
        c = ROTL(b, 30);
        b = a;
        a = T;
    }
    H[0] += a; H[1] += b; H[2] += c; H[3] += d; H[4] += e;
}

// Pad the input to a multiple of 512 bits, and add the length
// in binary to the end.
std::string sha1::pad(std::string const &input, size_t size) {
    size_t length = size * 8 + 1;
    size_t remainder = length % block_bits;
    size_t pad_len = block_bits-remainder;

    if (pad_len < min_pad)
        pad_len += block_bits;
    ++pad_len;

    pad_len &= ~7;
    std::string padding(pad_len/8, '\0');

    for (size_t i=0; i<sizeof(padding.size()); i++)
        padding[padding.size()-i-1] = (length-1) >> (i*8) & 0xff;
    padding[0] |= (unsigned char)0x80;

    std::string ret(input+padding);
    return ret;
}

// Turn 64 bytes into a block of 16 uint32_t's.
std::vector<uint32_t> sha1::make_block(std::string const &in) { 
    assert(in.size() >= block_bytes);

    std::vector<uint32_t> ret(block_words);

    for (size_t i=0; i<block_words; i++) {
        size_t s = i*4;
        ret[i] = word(in[s], in[s+1], in[s+2], in[s+3]);
    }
    return ret;
}


// Construct a SHA-1 object. More expensive that typical 
// ctor, but not expected to be copied a lot or anything
// like that, so it should be fairly harmless.
sha1::sha1() : K(80), H(5), W(80), fs(80), total_size(0) {
    static const uint32_t H0[] = { 
        0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0
    };
    static const uint32_t Ks[] = {
        0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6
    };

    std::copy(H0, H0+hash_size, H.begin());

    std::fill_n(K.begin()+00, 20, Ks[0]);
    std::fill_n(K.begin()+20, 20, Ks[1]);
    std::fill_n(K.begin()+40, 20, Ks[2]);
    std::fill_n(K.begin()+60, 20, Ks[3]);

    static f1 sf1;
    static f2 sf2;
    static f3 sf3;

    std::fill_n(fs.begin()+00, 20, &sf1);
    std::fill_n(fs.begin()+20, 20, &sf2);
    std::fill_n(fs.begin()+40, 20, &sf3);
    std::fill_n(fs.begin()+60, 20, &sf2);
}

// The two ways to provide input for hashing: as a stream or a string.
// Either way, you get the result as a vector<uint32_t>. It's a fairly
// small vector, so even if your compiler doesn't do return-value 
// optimization, the time it takes isn't likely to be significant.
// 
std::vector<uint32_t> sha1::operator()(std::string const &input) { 
    std::string temp(pad(input, total_size + input.size()));
    std::vector<uint32_t> block(block_size);

    size_t num = temp.size()/block_bytes;

    for (unsigned block_num=0; block_num<num; block_num++) {
        size_t s;
        for (size_t i=0; i<block_size; i++) {
            s = block_num*block_bytes+i*4;
            block[i] = word(temp[s], temp[s+1], temp[s+2], temp[s+3]);
        }
        hash_block(block);  
    }
    return H;
}

std::vector<uint32_t> sha1::operator()(std::istream &in) { 
    char raw_block[65];

    while (in.read(raw_block, block_bytes)) {
        total_size += block_bytes;
        std::string b(raw_block, in.gcount());
        hash_block(make_block(b));
    }
    std::string x(raw_block, in.gcount());
    return operator()(x);
}

std::ostream &operator<<(std::ostream &os, sha1 const &s) { 
    // Display a SHA-1 result in hex.
    for (size_t i=0; i<(s.H).size(); i++)
        os << std::fixed << std::setprecision(8) << std::hex << std::setfill('0') << (s.H)[i] << " ";
    return os << std::dec << std::setfill(' ') << "\n";
}

}

#ifdef TEST
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>

// A minimal test harness to check that it's working correctly. Strictly black-box
// testing, with no attempt at things like coverage analysis. Nonetheless, I believe
// it should cover most of the code -- the core hashing code all gets used for every
// possible value. The padding code should be tested fairly thoroughly as well -- the
// first test is a fairly simple case, and the second the more complex one (where the 
// padding requires adding another block).
class tester {
    bool verify(uint32_t *test_val, std::vector<uint32_t> const &hash, std::ostream &os) {
        // Verify that a result matches a test value and report result.
        for (size_t i=0; i<hash.size(); i++)
            if (hash[i] != test_val[i]) {
                os << "Mismatch. Expected: " << test_val[i] << ", but found: " << hash[i] << "\n";
                return false;
            }
            os << "Message digest Verified.\n\n";
            return true;
    }

public:

    bool operator()(uint32_t *test_val, std::string const &input) {
        std::cout << "Testing hashing from string:\n\"" << input << "\"\n";
        crypto::sha1 hasher1;
        std::vector<uint32_t> hash = hasher1(input);
        std::cout << "Message digest is:\n\t" << hasher1;
        bool verified = verify(test_val, hash, std::cerr);

        crypto::sha1 hasher2;
        std::cout << "Testing hashing from Stream:\n";
        std::istringstream buf(input);
        hash = hasher2(buf);
        std::cout << "Message digest is:\n\t" << hasher2;

        return verified & verify(test_val, hash, std::cerr);
    }
};

int main() {
    // These test values and results come directly from the SHA-1 FIPS pub.
    //
    char const *input1 = "abc";
    char const *input2 = "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq";
    uint32_t result1[] = {0xA9993E36, 0x4706816A, 0xBA3E2571, 0x7850C26C, 0x9CD0D89D};
    uint32_t result2[] = {0x84983E44, 0x1C3BD26E, 0xBAAE4AA1, 0xF95129E5, 0xE54670F1};
    bool correct = tester()(result1, input1);
    correct &= tester()(result2, input2);
    if (correct)
        std::cerr << "All Tests passed!\n";
    else
        std::cerr << "Test Failed!\n";
}
#elif defined(MAIN) 

#include <sstream>
#include <fstream>
#include <iostream>

int main(int argc, char **argv) { 
    if (argc < 2) {
        std::cerr << "Usage: sha1 [filename]\n";
        return EXIT_FAILURE;
    }
    for (int i=1; i<argc; i++) {
        crypto::sha1 hash;
        std::ifstream in(argv[i], std::ios_base::binary);
        if (in.good()) {
            hash(in);
            std::cout << "SHA-1(" << argv[i] << ") = " << hash << "\n";
        }
    }
    return 0;
}
#endif

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 for 循环生成小于或等于 56 且总和大于 10 的数字的 2 位数列表 - Using a for loop to generate a 2 digit list of numbers less than or equal to 56, and whose sum is greater than 10 C ++大于或等于运算符 - C++ greater than or equal to operator 如果第一位小数大于或等于5,如何对数字+1? - How to +1 to the digit if the first decimal is greater than or equal to 5? 为什么 std::sort 在比较函数使用大于 (&gt;) 而不是大于或等于 (&gt;=) 时起作用? - Why does std::sort work when the comparison function uses greater-than (>), but not greater-than-or-equal (>=)? c++ 使用大于等于运算符时出错 - c++ error when using greater than equal to operators 找出平均值大于或等于 K 的子数组的数量 - Find the number of subarrays whose average is greater than or equal K 如何找到小于或等于X的最大值和大于或等于X的最小值? - How do I find the largest value smaller than or equal to X and the smallest value greater than or equal to X? OpenSSL错误“数据大于mod len” - OpenSSL error “data greater than mod len” 如何同时返回小于或等于平均值​​的整数和大于平均值的整数 - How Can I return both number of ints less than or equal to average and number of ints greater than average 了解散列算法的位和字节 - Understanding bits and bytes of a hashing algorithm
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM