简体繁体 English

使用不同类型的参数处理std :: basic_string <>

[英]Handle std::basic_string<> with different type arguments

原文 2010-10-17 17:11:20 8 2 c++/ c++11

I want to implement a c++ library, and like many other libs I need to take string arguments from the user and giving back strings. 我想实现一个c ++库，并且像许多其他库一样，我需要从用户那里获取字符串参数并返回字符串。 The current standard defines std::string and std::wstring (I prefer wstring). 当前标准定义了std :: string和std :: wstring（我更喜欢wstring）。 Theoretically I have to implement methods with string arguments twice: 从理论上讲，我必须两次使用字符串参数实现方法：

virtual void foo(std::string &) = 0; // convert internally from a previous defined charset to unicode
virtual void foo(std::wstring &) = 0;

C++0x doesn't make life easier, for char16_t and char32_t I need: C ++ 0x并没有使生活更轻松，对于char16_t和char32_t我需要：

virtual void foo(std::u16string &) = 0;
virtual void foo(std::u32string &) = 0;

Handle such different types internally - for example putting all into a private vector member - requires conversion, wrappers... it's horrible. 在内部处理此类不同的类型（例如，将所有类型都放入一个私有vector成员中）需要转换，包装……这太可怕了。

Another problem is if a user (or myself) wants to work with custom allocators or customized trait classes: everthing results in a completely new type. 另一个问题是，如果用户（或我自己）想要使用自定义分配器或自定义特征类：一切都会导致一种全新的类型。 For example, to write custom codecvt specializations for multibyte charsets, the standard says I have to introduce a custom state_type - which requires a custom trait class which results in a new std::basic_ifstream<> type - and that's completely incompatible to interfaces expecting std::ifstream& as an argument. 例如，要为多字节字符集编写自定义编解码器专业化标准，该标准说我必须引入一个自定义state_type-这需要一个自定义特质类，该类会导致新的std :: basic_ifstream <>类型-并且与期望std的接口完全不兼容:: ifstream＆作为参数。

One -possible- solution is to construct each library class as a template that manages the value_type, traits and allocators specified by the user. 一种可能的解决方案是将每个库类构造为模板，以管理用户指定的value_type，traits和分配器。 But that's overkill and makes abstract base classes (interfaces) impossible. 但这太过分了，并且使抽象基类（接口）成为不可能。

Another solution is to just specify one type (eg u32string) as default, every user must pass data using this type. 另一种解决方案是仅指定一种类型（例如u32string）作为默认值，每个用户都必须使用此类型传递数据。 But now think about a project which uses 3 libraries, and the first lib uses u32string, the second lib u16string and the thirth lib wstring -> HELL. 但是现在考虑一个使用3个库的项目，第一个lib使用u32string，第二个lib使用u16string，第三个lib wstring-> HELL。

What I really want is to declare a method just as void foo(put_unicode_string_here) - without introduce my own UnicodeString or UnicodeStream class. 我真正想要的是将一个方法声明为void foo（put_unicode_string_here） -而不引入我自己的UnicodeString或UnicodeStream类。

2 个解决方案

There is always choice that has to be made if you don't want to support everything, but I personnally feel restricting input to UTF-8 is the easiest of all. 如果您不想支持所有功能，则总是必须做出选择，但是我个人认为将输入限制为UTF-8是最简单的选择。 Just use plain old std::string and everyone's happy. 只需使用普通的旧std::string ，每个人都会高兴。 In practice, the user (of your library) will only have to convert to UTF-8 if he's on Windows, but there's a plethora of ways to do that simple task. 实际上，（您库中的）用户只有在Windows上时才必须转换为UTF-8，但是有很多方法可以完成此简单任务。

UPDATE : on the other hand, you could template all of your code and leave the std::basic_string<T> as a template throughout your code. UPDATE ：另一方面，您可以对所有代码进行模板化，并在整个代码中将std::basic_string<T>保留为模板。 This only gets messy if you do different things dependent on the size of the template argument. 如果您根据模板参数的大小执行不同的操作，这只会变得混乱。

char_traits is indeed a hopelessly awful wastebin of random traits. char_traits确实是一个随机性状的绝望可怕的垃圾箱。 Should every string pre-specify the largest supported file size, case-sensitivity, and (ugh) state type of the encoding mechanism itself? 每个字符串都应该预先指定最大的支持文件大小，区分大小写以及编码机制本身的（ugh）状态类型吗？ NO. 没有。

However, what you ask is impossible even with well-designed traits. 但是，即使具有精心设计的特征，您所要求的也是不可能的。 string and wstring are meaningfully different because the size of the internal character type differs. string和wstring有意义地不同，因为内部字符类型的大小不同。 To run any kind of algorithm, you will need to query the object for char_t . 要运行任何类型的算法，您将需要在对象中查询char_t 。 That requires RTTI or virtual functions because basic_string doesn't (and shouldn't) maintain that info at runtime. 这需要RTTI或虚函数，因为basic_string不会（也不应该）在运行时维护该信息。

One -possible- solution is to construct each library class as a template that manages the value_type, traits and allocators specified by the user. 一种可能的解决方案是将每个库类构造为模板，以管理用户指定的value_type，traits和分配器。 But that's overkill and makes abstract base classes (interfaces) impossible. 但这太过分了，并且使抽象基类（接口）成为不可能。

This is the only complete solution. 这是唯一完整的解决方案。 Templates actually do play well with abstract base classes: a number of templates can derive from a non-template abstract base, or the base can also be templated. 模板实际上可以很好地与抽象基类一起使用：许多模板可以从非模板抽象基派生，或者也可以将其模板化。 However, it is difficult if not untenable because of the sensitivity and tedium of writing perfectly generic code. 但是，由于编写完全通用的代码的敏感性和乏味性，这是很难的，即使不是站不住脚的。

Another solution is to just specify one type (eg u32string) as default, every user must pass data using this type. 另一种解决方案是仅指定一种类型（例如u32string）作为默认值，每个用户都必须使用此类型传递数据。 But now think about a project which uses 3 libraries, and the first lib uses u32string, the second lib u16string and the thirth lib wstring -> HELL. 但是现在考虑一个使用3个库的项目，第一个lib使用u32string，第二个lib使用u16string，第三个lib wstring-> HELL。

This is why I'm scared by C++11's "improved" Unicode support. 这就是为什么我对C ++ 11的“改进” Unicode支持感到恐惧的原因。 It simplifies direct interaction with file data and discourages abstraction to a common wchar_t internal format. 它简化了与文件数据的直接交互，并且不鼓励抽象为通用的 wchar_t内部格式。 It would have been better to require specific codecvts for UTF-16 and UTF-32 and specify that wchar_t must be at least 21 bits. 最好为UTF-16和UTF-32要求特定的编解码器，并指定wchar_t必须至少为21位。 Whereas before there were only "dumb" char and "smart" wchar_t libraries among clean C++ interfaces, we may have to contend with additional widths — and char16_t is just an instant red flag. 以前，在干净的C ++接口中只有“哑” char和“智能” wchar_t库，我们可能不得不应对额外的宽度-而char16_t只是一个即时的char16_t信号。

But, that's down the road. 但是，那是要走的路。

If you really end up using a number of incompatible libraries, and the problem is shuttling data between functions requiring different formats, then write a ScopeGuard-style utility to convert from and back to your chosen common format, such as wstring . 如果您最终确实使用了许多不兼容的库，而问题出在需要不同格式的函数之间穿梭数据，则编写ScopeGuard风格的实用程序来与您选择的通用格式（例如wstring 。 This utility can be a template with an explicit specialization for each incompatible format you need, or a non-templated set of classes. 该实用程序可以是具有所需每种不兼容格式的显式专业化的模板，也可以是一组非模板化的类。