在具有特定 utf16 字符大小的机器上将字节数组读入 utf16 字符

Question

I have a question about utf16_t character interaction and SHA256 generation with openSSL.我对使用 openSSL 的 utf16_t 字符交互和 SHA256 生成有疑问。 The thing is, I'm currently writing code that should deal with password hashing.问题是，我目前正在编写应该处理密码散列的代码。 I've generated a 256-bit hash, and I want to throw it into the database in a utf-16 encoded character field.我已经生成了一个 256 位的 hash，我想将它放入 utf-16 编码字符字段的数据库中。 In my c++ code, I use char16_t to store such data.在我的 c++ 代码中，我使用 char16_t 来存储此类数据。 However, there is a problem.但是，有一个问题。 utf16_t can have more than 16 bytes depending on the machine it ends up on. utf16_t 可以有超过 16 个字节，具体取决于它最终所在的机器。 And if I use memcpy to copy bytes from my sha256 hash, it may turn out to be a mess on some machines.如果我使用 memcpy 从我的 sha256 hash 复制字节，在某些机器上可能会变得一团糟。 Please tell me, what should I do in this situation?请告诉我，在这种情况下我该怎么办？ Read bytes differently, store hashes in the database differently, maybe something else?以不同的方式读取字节，以不同的方式将哈希存储在数据库中，也许还有别的？

Answer 1

SHA256 generates 256 essentially random bits (32 bytes) of data. SHA256 生成 256 个基本随机位（32 个字节）的数据。 It will not always generate valid UTF-16 data.它不会总是生成有效的 UTF-16 数据。

You need to somehow encode the 32 bytes into more-than-32 utf-16 bytes to store in your database.您需要以某种方式将 32 个字节编码为超过 32 个 utf-16 字节以存储在您的数据库中。 Or you can convert the database field to a proper 256-bit binary type或者您可以将数据库字段转换为适当的 256 位二进制类型

One of the easier-to-implement ways to store it in your DB as a string would be to map each byte to a character 1-to-1 (and store 32 bytes of data with 32 bytes of zeroes in between):将其作为字符串存储在数据库中的一种更易于实现的方法是将 map 每个字节以 1 对 1 的方式存储（并存储 32 个字节的数据，其间有 32 个字节的零）：

unsigned char sha256_hash[256/8];
get_hash(sha256_hash);
// encoding
char16_t db_data[256/8];
for (int i = 0; i < std::size(db_data); ++i) {
    db_data[i] = char16_t(sha256_hash[i]);
}
write_to_db(db_data);


char16_t db_data[256/8];
read_from_db(db_data);
// decoding
unsigned char sha256_hash[256/8];
for (int i = 0; i < std::size(sha256_hash); ++i) {
    assert((std::uint16_t) db_data[i] <= 0xFF);
    sha256_hash[i] = (unsigned char) db_data[i];
}

Be careful if you are using null-terminated strings though.但是，如果您使用以空字符结尾的字符串，请小心。 You will need an extra character for the null terminator and map the 0 byte to something else ( 0x100 would be a good choice). null 终止符和 map 将需要一个额外的字符，将 0 字节转换为其他字符（ 0x100将是一个不错的选择）。

But if you have additional requirements (like it being readable characters), you might consider base64 or a hexadecimal encoding但是如果你有额外的要求（比如它是可读的字符），你可以考虑 base64 或十六进制编码

在具有特定 utf16 字符大小的机器上将字节数组读入 utf16 字符

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-09-26 08:24:27

在具有特定 utf16 字符大小的机器上将字节数组读入 utf16 字符

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-09-26 08:24:27

解决方案1
0 已采纳 2022-09-26 08:24:27