简体   繁体   English

string_algo的boost to_upper函数没有考虑语言环境

[英]boost to_upper function of string_algo doesn't take into account the locale

I have a problem with the functions in the string_algo package. 我对string_algo包中的函数有问题。

Consider this piece of code: 考虑一下这段代码:

#include <boost/algorithm/string.hpp>
int main() {
   try{
      string s = "meißen";
      locale l("de_DE.UTF-8");
      to_upper(s, l);
      cout << s << endl;
   catch(std::runtime_error& e){
      cerr << e.what() << endl;
   }

   try{
      string s = "composición";
      locale l("es_CO.UTF-8");
      to_upper(s, l);
      cout << s << endl;
   catch(std::runtime_error& e){
      cerr << e.what() << endl;
   }
}

The expected output for this code would be: 此代码的预期输出将是:

MEISSEN
COMPOSICIÓN

however the only thing I get is 但我唯一得到的是

MEIßEN
COMPOSICIóN

so, clearly the locale is not being taken into account. 所以,显然没有考虑到语言环境。 I even try to set the global locale with no success. 我甚至尝试设置全局语言环境没有成功。 What can I do? 我能做什么?

In addition to the answer of Éric Malenfant -- std::locale facets works on single character. 除了ÉricMalenfant的答案 - std::locale facets适用于单个字符。 To get better result you may use std::wstring -- thus more characters would be converterd, but as you can see it is still not perfect (example ß). 为了获得更好的结果,你可以使用std::wstring - 因此会转换更多的字符,但正如你所看到的那样仍然不完美(例如ß)。

I would suggest to give a try to Boost.Locale (new library for boost, not yet in boost), that does stuff 我建议尝试一下Boost.Locale(用于提升的新库,还没有用于提升),这样做有用

http://cppcms.sourceforge.net/boost_locale/docs/ , http://cppcms.sourceforge.net/boost_locale/docs/ ,

Especially see http://cppcms.sourceforge.net/boost_locale/docs/index.html#conversions that deals with the problem you are talking about. 特别是请参阅http://cppcms.sourceforge.net/boost_locale/docs/index.html#conversions来处理您正在讨论的问题。

std::toupper assumes a 1:1 conversion, so there is no hope for the ß to SS case, Boost.StringAlgo or not. std :: toupper假定转换为1:1,所以对于ß到SS的情况,Boost.StringAlgo没有希望。

Looking at StringAlgo's code , we see that it does use the locale (Except on Borland, it seems). 看看StringAlgo的代码 ,我们看到它确实使用了语言环境(看起来除了Borland之外)。 So, for the other case, I'm curious: What is the result of toupper('ó', std::locale("es_CO.UTF-8")) on your platform? 所以,对于另一种情况,我很好奇:你的平台上toupper('ó', std::locale("es_CO.UTF-8"))是什么?

Writing the above makes me think about something else: What is the encoding of the strings in your sources? 写上面的内容让我想到了其他的东西:源代码中字符串的编码是什么? UTF8? UTF8? In that case, std::toupper will see two code units for 'ó', so there is no hope. 在这种情况下,std :: toupper会看到'ó'的两个代码单元,所以没有希望。 Latin1? Latin1的? In that case, using a locale named ".UTF-8" is inconsistent. 在这种情况下,使用名为“.UTF-8”的区域设置是不一致的。

You can use boost::locale. 您可以使用boost :: locale。 Here is an example. 是一个例子。

In the standard library there is std::toupper (which boost::to_upper uses) that operates on one character at a time. 在标准库中有std :: toupper(boost :: to_upper使用),它一次对一个字符进行操作。

This explains why the ß doesn't work. 这解释了为什么ß不起作用。 You didn't say which standard library and codepage you are using so I don't know why the ó didn't work. 你没有说你正在使用哪个标准库和代码页,所以我不知道为什么ó不起作用。

What happens if you use wstring instead? 如果你使用wstring会发生什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM