简体   繁体   English

在具有特定 utf16 字符大小的机器上将字节数组读入 utf16 字符

[英]Reading an array of bytes into utf16 characters on a machine with a specific utf16 character size

I have a question about utf16_t character interaction and SHA256 generation with openSSL.我对使用 openSSL 的 utf16_t 字符交互和 SHA256 生成有疑问。 The thing is, I'm currently writing code that should deal with password hashing.问题是,我目前正在编写应该处理密码散列的代码。 I've generated a 256-bit hash, and I want to throw it into the database in a utf-16 encoded character field.我已经生成了一个 256 位的 hash,我想将它放入 utf-16 编码字符字段的数据库中。 In my c++ code, I use char16_t to store such data.在我的 c++ 代码中,我使用 char16_t 来存储此类数据。 However, there is a problem.但是,有一个问题。 utf16_t can have more than 16 bytes depending on the machine it ends up on. utf16_t 可以有超过 16 个字节,具体取决于它最终所在的机器。 And if I use memcpy to copy bytes from my sha256 hash, it may turn out to be a mess on some machines.如果我使用 memcpy 从我的 sha256 hash 复制字节,在某些机器上可能会变得一团糟。 Please tell me, what should I do in this situation?请告诉我,在这种情况下我该怎么办? Read bytes differently, store hashes in the database differently, maybe something else?以不同的方式读取字节,以不同的方式将哈希存储在数据库中,也许还有别的?

SHA256 generates 256 essentially random bits (32 bytes) of data. SHA256 生成 256 个基本随机位(32 个字节)的数据。 It will not always generate valid UTF-16 data.它不会总是生成有效的 UTF-16 数据。

You need to somehow encode the 32 bytes into more-than-32 utf-16 bytes to store in your database.您需要以某种方式将 32 个字节编码为超过 32 个 utf-16 字节以存储在您的数据库中。 Or you can convert the database field to a proper 256-bit binary type或者您可以将数据库字段转换为适当的 256 位二进制类型

One of the easier-to-implement ways to store it in your DB as a string would be to map each byte to a character 1-to-1 (and store 32 bytes of data with 32 bytes of zeroes in between):将其作为字符串存储在数据库中的一种更易于实现的方法是将 map 每个字节以 1 对 1 的方式存储(并存储 32 个字节的数据,其间有 32 个字节的零):

unsigned char sha256_hash[256/8];
get_hash(sha256_hash);
// encoding
char16_t db_data[256/8];
for (int i = 0; i < std::size(db_data); ++i) {
    db_data[i] = char16_t(sha256_hash[i]);
}
write_to_db(db_data);


char16_t db_data[256/8];
read_from_db(db_data);
// decoding
unsigned char sha256_hash[256/8];
for (int i = 0; i < std::size(sha256_hash); ++i) {
    assert((std::uint16_t) db_data[i] <= 0xFF);
    sha256_hash[i] = (unsigned char) db_data[i];
}

Be careful if you are using null-terminated strings though.但是,如果您使用以空字符结尾的字符串,请小心。 You will need an extra character for the null terminator and map the 0 byte to something else ( 0x100 would be a good choice). null 终止符和 map 将需要一个额外的字符,将 0 字节转换为其他字符( 0x100将是一个不错的选择)。

But if you have additional requirements (like it being readable characters), you might consider base64 or a hexadecimal encoding但是如果你有额外的要求(比如它是可读的字符),你可以考虑 base64 或十六进制编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM