C ++对MBCS使用std :: string函数，对UTF-16使用std :: wstring函数

Question

Has anyone dealt with using std::string functions for MBCS? 有人处理过MBCS使用std :: string函数吗？ For example in CI could do this: 例如，在CI中可以做到这一点：

p = _mbsrchr(path, '\\');

but in C++ I'm doing this: 但是在C ++中，我正在这样做：

found = path.find_last_of('\\');

If the trail byte is a slash then would find_last_of stop at the trail byte? 如果跟踪字节是斜杠，那么find_last_of会在跟踪字节处停止吗？ Also same question for std::wstring. 同样对于std :: wstring的问题。

If I need to replace all of one character with another, say all forward slashes with backslashes what would be the right way to do that? 如果我需要将一个字符的所有字符替换为另一个字符，请用反斜杠说所有正斜杠，这样做的正确方法是什么？ Would I have to check each character for a lead surrogate byte and then skip the trail? 我是否需要检查每个字符的前导代理字节，然后跳过跟踪？ Right now I'm doing this for each wchar: 现在，我正在为每个wchar这样做：

if( *i == L'/' )
*i = L'\\';

Thanks 谢谢

Edit: As David correctly points out there is more to deal with when working with multibyte codepages. 编辑：正如David正确指出的那样，在处理多字节代码页时还有更多需要处理的内容。 Microsoft says use _mbclen for working with byte indices and MBCS. 微软表示使用_mbclen处理字节索引和MBCS。 It does not appear I can use find_last_of reliably when working with the ANSI codepages. 在使用ANSI代码页时，似乎无法可靠地使用find_last_of。

Answer 1

You don't need to do anything special about surrogate pairs. 您不需要对代理对做任何特别的事情。 A single 16 bit character unit that is one half of a surrogate pair, cannot also be a non-surrogate character unit. 代理对的一半的单个16位字符单元也不能是非代理字符单元。

So, 所以，

if( *i == L'/' )
    *i = L'\\';

is perfectly correct. 是完全正确的。

Equally you can use find_last_of with wstring . 同样，您可以将find_last_of与wstring find_last_of使用。

It's more complicated for multi-byte ANSI codepages. 对于多字节ANSI代码页而言，它更为复杂。 You do need to deal with lead and trail byte issues. 您确实需要处理前导和尾部字节问题。 My recommendation is to normalise to a more reasonable encoding if you really have to deal with multi-byte ANSI date. 我的建议是，如果您确实需要处理多字节ANSI日期，请标准化为更合理的编码。

C ++对MBCS使用std :: string函数，对UTF-16使用std :: wstring函数

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-05-19 19:52:09

C ++对MBCS使用std :: string函数，对UTF-16使用std :: wstring函数

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-05-19 19:52:09

解决方案1
1 已采纳 2012-05-19 19:52:09