简体   繁体   English

双字符串转换和语言环境

[英]double string conversion and locale

A common international issue is the conversion of double values represented in strings.一个常见的国际问题是以字符串表示的双精度值的转换。 This stuff is found in a lot of areas.这种东西在很多地方都能找到。

Starting with csv files which are either called从 csv 个文件开始

comma separated

or要么

character separated

because sometimes they are stored like因为有时它们存储起来像

1.2,3.4
5.6,6.4

in English regions or在英语地区或

1,2;3,4
5,6;6,4

in for example German regions.例如在德国地区。

From this background, it is somehow necessary to know that most of the std:: methods are locale dependent.从这个背景来看,有必要知道大多数 std:: 方法都依赖于语言环境。 So in Germany, they will read "1,2" as 1.2 and write it back as "1,2" but with an English OS it will read "1,2" as 1 and write it back as "1".所以在德国,他们会将“1,2”读作 1.2,然后将其写回“1,2”,但在英语操作系统中,他们会将“1,2”读作 1,并将其写回“1”。

Because the locale is a global state of the application, it is not a good idea to switch it to a different setting;因为语言环境是应用程序的全局 state,所以将它切换到不同的设置并不是一个好主意; and here we are with some problems when I have to read a German CSV file on an English machine or vice versa.当我必须在英语机器上阅读德语 CSV 文件时,我们遇到了一些问题,反之亦然。

It's also hard to write code that behaves the same on all machines.编写在所有机器上表现相同的代码也很困难。 The C++ stream allows a locale setting per stream. C++ stream 允许根据 stream 进行语言环境设置。

class Punctation : public numpunct<wchar_t>
{
public:

  typedef wchar_t char_type;
  typedef std::wstring string_type;

  explicit Punctation(const wchar_t& decimalPoint, std::size_t r = 0) : 
    decimalPoint_(decimalPoint), numpunct<wchar_t>(r)
  {
  }

  Punctation(const Punctation& rhs) : 
    decimalPoint_(rhs.decimalPoint_) 
  {
  }

protected:

  virtual ~Punctation() 
  {
  };

  virtual wchar_t do_decimal_point() const 
  { 
    return decimalPoint_; 
  }

private:

  Punctation& operator=(const Punctation& rhs);

  const wchar_t decimalPoint_;
};

...

std::locale newloc(std::locale::classic(), new Punctation(L','));
stream.imbue(newloc);

will allow you to initialize a stream with std:: C behavior and only replace the decimal point.将允许您使用 std:: C 行为初始化 stream 并且仅替换小数点。 This gives me the ability to ignore the thousand separator, which may come into affect too.这使我能够忽略千位分隔符,它也可能会受到影响。 German 1000.12 may become "1.000,12";德语 1000.12 可能变成“1.000,12”; or in English "1,000.12" will end up in complete confusion.或英文中的“1,000.12”将完全混乱。 Even replacing "," by "."甚至将“,”替换为“。” will not help in this situation.在这种情况下无济于事。

If I have to work with atof and friends I can use如果我必须与atof和朋友一起工作,我可以使用

const char decimal_point = *(localeconv()->decimal_point);

to pimp my behavior.拉皮条我的行为。

So there is an awful amount of stuff just for international double behavior.所以有大量的东西只是为了国际双重行为。 Even my Visual Studio runs into problems because the German version wants to write 8,0 as version into the vcproj file while an English version wants to change it to 8.0, which definitively happened by incident because in XML it is defined to be 8.0 in all countries of the world.甚至我的 Visual Studio 也遇到了问题,因为德语版本想要将 8,0 作为版本写入 vcproj 文件,而英语版本想要将其更改为 8.0,这肯定是偶然发生的,因为在 XML 中它被定义为 8.0世界各国。

So I just wanted to describe the problem a bit to ask for aspects I may have ignored.所以我只是想稍微描述一下问题,以询问我可能忽略的方面。 Things that I know:我知道的事情:

  • decimal pint is locale dependent十进制品脱取决于语言环境
  • thousand separator is locale dependent千位分隔符取决于语言环境
  • exponent is locale dependent指数取决于语言环境

//                  German       English     Also known
// decimal point       ,            .            
// exponent            e/E          e/E          d/D
// thousand sep        .            ,

Which country uses which setting?哪个国家使用哪个设置? Maybe you can add me some interesting examples that I didn't have till now.也许你可以给我添加一些我现在还没有的有趣的例子。

Don't ever use atof( s ).永远不要使用 atof(s)。 It's a quick & dirty shortcut for strtod( s, 0 ) without the error reporting.这是 strtod( s, 0 ) 的快捷方式,没有错误报告。 (Same for atoi() and strtol().) (与 atoi() 和 strtol() 相同。)

If a function be advertised to return an error code in the event of difficulties, thou shalt check for that code, yea, even though the checks triple the size of thy code and produce aches in thy typing fingers, for if thou thinkest 'it cannot happen to me', the gods shall surely punish thee for thy arrogance.如果 function 被宣传为在遇到困难时返回错误代码,你应该检查该代码,是的,即使检查是你的代码大小的三倍并且会让你打字的手指疼痛,因为如果你认为“它不能发生在我身上',众神一定会惩罚你的傲慢。

(Henry Spencer, "Ten Commandments for the C Programmer", Commandment #6) (Henry Spencer,“给 C 程序员的十诫”,第 6 条诫命)

I think you're looking for Appendix D of The C++ Programming Language .我认为您正在寻找The C++ Programming Language的附录 D。 You may be interested that it is possible to have multiple locales in use at a time in a program.您可能对在一个程序中可以同时使用多个语言环境感兴趣。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM