在Crypto ++中获取Unicode字符串的SHA1

Question

I study C++ independently and I have one problem, which I can't solve more than week. 我独立学习C ++，我遇到了一个问题，我不能解决这个问题。 I hope you can help me. 我希望你能帮助我。

I need to get a SHA1 digest of a Unicode string (like Привет ), but I don't know how to do that. 我需要获取Unicode字符串的SHA1摘要（如Привет ），但我不知道如何做到这一点。

I tried to do it like this, but it returns a wrong digest! 我尝试这样做，但它返回错误的摘要！

For wstring('Ы') It returns - A469A61DF29A7568A6CC63318EA8741FA1CF2A7 对于wstring('Ы')它返回 - A469A61DF29A7568A6CC63318EA8741FA1CF2A7
I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373 我需要 - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

Regards and sorry for my English :). 关心并抱歉我的英语:)。

CryptoPP 5.6.2 MVC++ 2013 CryptoPP 5.6.2 MVC ++ 2013

#include <iostream>
#include "cryptopp562\cryptlib.h"
#include "cryptopp562\sha.h"
#include "cryptopp562\hex.h"

int main() {

    std::wstring string(L"Ы");
    int bs_size = (int)string.length() * sizeof(wchar_t);

    byte* bytes_string = new byte[bs_size];

    int n = 0; //real bytes count
    for (int i = 0; i < string.length(); i++) {
        wchar_t wcharacter = string[i];

        int high_byte = wcharacter & 0xFF00;

        high_byte = high_byte >> 8;

        int low_byte = wcharacter & 0xFF;

        if (high_byte != 0) {
            bytes_string[n++] = (byte)high_byte;
        }

        bytes_string[n++] = (byte)low_byte;
    }

    CryptoPP::SHA1 sha1;
    std::string hash;

    CryptoPP::StringSource ss(bytes_string, n, true,
        new CryptoPP::HashFilter(sha1,
            new CryptoPP::HexEncoder(
                new CryptoPP::StringSink(hash)
            ) 
        ) 
    );

    std::cout << hash << std::endl;

    return 0;
}

Answer 1

You say 'but it returns wrong digest' – what are you comparing it with? 你说'但它会返回错误的摘要' - 你在比较它是什么？

Key point: digests such as SHA-1 don't work with sequences of characters, but with sequences of bytes . 关键点：诸如SHA-1之类的摘要不适用于字符序列，而是使用字节序列。

What you're doing in this snippet of code is generating an ad-hoc encoding of the unicode characters in the string "Ы" . 你在这段代码中所做的就是在字符串"Ы"生成unicode字符的ad-hoc 编码。 This encoding will (as it turns out) match the UTF-16 encoding if the characters in the string are all in the BMP ('basic multilingual plane', which is true in this case) and if the numbers that end up in wcharacter are integers representing unicode codepoints (which is sort-of probably correct, but not, I think, guaranteed). 如果字符串中的字符全部在BMP中（“基本多语言平面”，在这种情况下为真），并且最终在wcharacter中的数字是，那么这种编码将（结果证明）匹配UTF-16编码。表示unicode代码点的整数（这可能是正确的，但我认为不保证）。

If the digest you're comparing it with turns an input string into an sequence of bytes using the UTF-8 encoding (which is quite likely), then that will produce a different byte sequence from yours, so that the SHA-1 digest of that sequence will be different from the digest you calculate here. 如果您正在比较的摘要将输入字符串转换为使用UTF-8编码的字节序列（这很可能），那么这将产生与您的字节序列不同的字节序列，以便SHA-1摘要该序列将与您在此处计算的摘要不同。

So: 所以：

Check what encoding your test string is using. 检查测试字符串使用的编码。
You'd be best off using some library functions to specifically generate a UTF-16 or UTF-8 (as appropriate) encoding of the string you want to process, to ensure that the byte sequence you're working with is what you think it is. 您最好使用一些库函数来专门生成要处理的字符串的UTF-16或UTF-8（视情况而定）编码，以确保您正在使用的字节序列是您认为的是。

There's an excellent introduction to unicode and encodings in the aptly-named document The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) 在适当命名的文档中有一个关于unicode和编码的出色介绍绝对最少，每个软件开发人员绝对必须知道Unicode和字符集（没有借口！）

Answer 2

I need to get a SHA1 digest of a Unicode string (like Привет), but I don't know how to do that. 我需要获取Unicode字符串的SHA1摘要（如Привет），但我不知道如何做到这一点。

The trick here is you need to know how to encode the Unicode string. 这里的技巧是你需要知道如何编码Unicode字符串。 On Windows, a wchar_t is 2 octets; 在Windows上， wchar_t是2个八位字节; while on Linux a wchar_t is 4 otects. 而在Linux上， wchar_t是4 otects。 There's a Crypto++ wiki page on it at Character Set Considerations , but its not that good. 在字符集注意事项上有一个Crypto ++维基页面，但它不是那么好。

To interoperate most effectively, always use UTF-8. 为了最有效地进行互操作，请始终使用UTF-8。 That means you convert UTF-16 or UTF-32 to UTF-8. 这意味着您将UTF-16或UTF-32转换为UTF-8。 Because you are on Windows, you will want to call WideCharToMultiByte function to convert it using CP_UTF8 . 因为您在Windows上，所以您需要调用WideCharToMultiByte函数以使用CP_UTF8进行转换。 If you were on Linux, then you would use libiconv . 如果您使用的是Linux，那么您将使用libiconv 。

Crypto++ has a built-in function called StringNarrow that uses C++. Crypto ++有一个名为StringNarrow的内置函数，它使用C ++。 Its in the file misc.h . 它在文件misc.h 。 Be sure to call setlocale before using it. 务必在使用之前调用setlocale 。

Stack Overflow has a few question on using the Windows function . Stack Overflow有一些关于使用Windows功能的问题。 See, for example, How do you properly use WideCharToMultiByte . 例如，请参阅如何正确使用WideCharToMultiByte 。

I need - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373 我需要 - 8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

What is the hash (SHA-1, SHA-256, ...)? 什么是哈希值（SHA-1，SHA-256，...）？ Is it a HMAC (keyed hash)? 它是HMAC（键控哈希）吗？ Is the information salted (like a password in storage)? 信息是否被腌制（如存储中的密码）？ How is it encoded? 它是如何编码的？ I have to ask because I cannot reproduce your desired results: 我不得不问，因为我无法重现你想要的结果：

SHA-1:   2805AE8E7E12F182135F92FB90843BB1080D3BE8
SHA-224: 891CFB544EB6F3C212190705F7229D91DB6CECD4718EA65E0FA1B112
SHA-256: DD679C0B9FD408A04148AA7D30C9DF393F67B7227F65693FFFE0ED6D0F0ADE59
SHA-384: 0D83489095F455E4EF5186F2B071AB28E0D06132ABC9050B683DA28A463697AD
         1195FF77F050F20AFBD3D5101DF18C0D
SHA-512: 0F9F88EE4FA40D2135F98B839F601F227B4710F00C8BC48FDE78FF3333BD17E4
         1D80AF9FE6FD68515A5F5F91E83E87DE3C33F899661066B638DB505C9CC0153D

Here's the program I used. 这是我用过的程序。 Be sure to specify the length of the wide string . 请务必指定宽字符串的长度 。 If you don't (and use -1 for the length), then WideCharToMultiByte will include the terminating ASCII-Z in its calculations. 如果不这样做（并且长度使用-1 ），则WideCharToMultiByte将在其计算中包含终止ASCII-Z。 Since we are using a std::string , we don't need the function to include the ASCII-Z terminator. 由于我们使用的是std::string ，因此我们不需要该函数来包含ASCII-Z终止符。

int main(int argc, char* argv[])
{
    wstring m1 = L"Привет"; string m2;

    int req = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), NULL, 0, NULL, NULL);
    if(req < 0 || req == 0)
        throw runtime_error("Failed to convert string");

    m2.resize((size_t)req);

    int cch = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), &m2[0], (int)m2.length(), NULL, NULL);
    if(cch < 0 || cch == 0)
        throw runtime_error("Failed to convert string");

    // Should not be required
    m2.resize((size_t)cch);

    string s1, s2, s3, s4, s5;
    SHA1 sha1; SHA224 sha224; SHA256 sha256; SHA384 sha384; SHA512 sha512;

    HashFilter f1(sha1, new HexEncoder(new StringSink(s1)));
    HashFilter f2(sha224, new HexEncoder(new StringSink(s2)));
    HashFilter f3(sha256, new HexEncoder(new StringSink(s3)));
    HashFilter f4(sha384, new HexEncoder(new StringSink(s4)));
    HashFilter f5(sha512, new HexEncoder(new StringSink(s5)));

    ChannelSwitch cs;
    cs.AddDefaultRoute(f1);
    cs.AddDefaultRoute(f2);
    cs.AddDefaultRoute(f3);
    cs.AddDefaultRoute(f4);
    cs.AddDefaultRoute(f5);

    StringSource ss(m2, true /*pumpAll*/, new Redirector(cs));

    cout << "SHA-1:   " << s1 << endl;
    cout << "SHA-224: " << s2 << endl;
    cout << "SHA-256: " << s3 << endl;
    cout << "SHA-384: " << s4 << endl;
    cout << "SHA-512: " << s5 << endl;

    return 0;
}

Answer 3

This seems to work fine for me. 这对我来说似乎很好。

Rather than fiddling about trying to extract the pieces I simply cast the wide character buffer to a const byte* and pass that (and the adjusted size) to the hash function. 我只是将宽字符缓冲区强制转换为const byte* ，并将其（和调整后的大小）传递给散列函数，而不是摆弄试图提取碎片。

int main() {

    std::wstring string(L"Привет");

    CryptoPP::SHA1 sha1;
    std::string hash;

    CryptoPP::StringSource ss(
        reinterpret_cast<const byte*>(string.c_str()), // cast to const byte*
        string.size() * sizeof(std::wstring::value_type), // adjust for size
        true,
        new CryptoPP::HashFilter(sha1,
            new CryptoPP::HexEncoder(
                new CryptoPP::StringSink(hash)
            )
        )
    );

    std::cout << hash << std::endl;

    return 0;
}

Output: 输出：

C6F8291E68E478DD5BD1BC2EC2A7B7FC0CEE1420

EDIT: To add. 编辑：添加。

The result is going to be encoding dependant. 结果将依赖于encoding 。 For example I ran this on Linux where wchar_t is 4 bytes. 例如，我在Linux上运行它，其中wchar_t是4个字节。 On Windows I believe wchar_t may be only 2 bytes. 在Windows我相信wchar_t可能只有2个字节。

For consistency it may be better to use UTF8 a store the text in a normal std::string . 为了保持一致性，最好使用UTF8将文本存储在普通的std::string 。 This also makes calling the API simpler: 这也使得调用API更简单：

int main() {

    std::string string("Привет"); // UTF-8 encoded

    CryptoPP::SHA1 sha1;
    std::string hash;

    CryptoPP::StringSource ss(
        string,
        true,
        new CryptoPP::HashFilter(sha1,
            new CryptoPP::HexEncoder(
                new CryptoPP::StringSink(hash)
            )
        )
    );

    std::cout << hash << std::endl;

    return 0;
}

Output: 输出：

2805AE8E7E12F182135F92FB90843BB1080D3BE8

在Crypto ++中获取Unicode字符串的SHA1

问题描述

3 个解决方案

解决方案1
3 2015-04-20 18:48:46

解决方案2
3 已采纳 2015-04-21 18:38:51

解决方案3
2 2015-04-20 18:37:28

在Crypto ++中获取Unicode字符串的SHA1

问题描述

3 个解决方案

解决方案1 3 2015-04-20 18:48:46

解决方案2 3 已采纳 2015-04-21 18:38:51

解决方案3 2 2015-04-20 18:37:28

解决方案1
3 2015-04-20 18:48:46

解决方案2
3 已采纳 2015-04-21 18:38:51

解决方案3
2 2015-04-20 18:37:28