简体   繁体   English

有效地将字符串加倍而不使用科学计数法或尾随零

[英]double to string without scientific notation or trailing zeros, efficiently

This routine is called a zillion times to create large csv files full of numbers. 这个例程被称为无数次,以创建充满数字的大型csv文件。 Is there a more efficient way to to this? 有更有效的方法吗?

    static std::string dbl2str(double d)
    {
        std::stringstream ss;
        ss << std::fixed << std::setprecision(10) << d;              //convert double to string w fixed notation, hi precision
        std::string s = ss.str();                                    //output to std::string
        s.erase(s.find_last_not_of('0') + 1, std::string::npos);     //remove trailing 000s    (123.1200 => 123.12,  123.000 => 123.)
        return (s[s.size()-1] == '.') ? s.substr(0, s.size()-1) : s; //remove dangling decimal (123. => 123)
    }

Before you start, check whether significant time is spent in this function. 在开始之前,请检查此功能是否花费了大量时间。 Do this by measuring, either with a profiler or otherwise. 通过使用探查器或其他方法进行测量来执行此操作。 Knowing that you call it a zillion times is all very well, but if it turns out your program still only spends 1% of its time in this function, then nothing you do here can possibly improve your program's performance by more than 1%. 知道您称它为无数次是非常好的,但是如果事实证明您的程序仍然只花费其时间的1%于此功能,那么您在此处不做任何事情可能会使程序的性能提高1%以上。 If that were the case the answer to your question would be "for your purposes no, this function cannot be made significantly more efficient and you are wasting your time if you try". 如果真是这样,那么您的问题的答案将是“出于您的目的,不能使此功能显着提高效率,并且如果您尝试浪费时间”。

First thing, avoid s.substr(0, s.size()-1) . 首先,请避免s.substr(0, s.size()-1) This copies most of the string and it makes your function ineligible for NRVO, so I think generally you'll get a copy on return. 这将复制大多数字符串并使您的函数不符合NRVO的条件,因此我认为通常您会在返回时获得一个副本。 So the first change I'd make is to replace the last line with: 因此,我要做的第一个更改是将最后一行替换为:

if(s[s.size()-1] == '.') {
    s.erase(s.end()-1);
}
return s;

But if performance is a serious concern, then here's how I'd do it. 但是,如果性能是一个严重的问题,那么我将按照以下方法进行操作。 I'm not promising that this is the fastest possible, but it avoids some issues with unnecessary allocations and copying. 我不能保证这是最快的方法,但是可以避免不必要的分配和复制带来的一些问题。 Any approach involving stringstream is going to require a copy from the stringstream to the result, so we want a more low-level operation, snprintf . 任何涉及stringstream方法都需要从stringstream到结果的副本,因此我们需要一个更底层的操作snprintf

static std::string dbl2str(double d)
{
    size_t len = std::snprintf(0, 0, "%.10f", d);
    std::string s(len+1, 0);
    // technically non-portable, see below
    std::snprintf(&s[0], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
    return s;
}

The second call to snprintf assumes that std::string uses contiguous storage. snprintf的第二次调用假定std::string使用连续存储。 This is guaranteed in C++11. 这在C ++ 11中得到保证。 It is not guaranteed in C++03, but is true for all actively-maintained implementations of std::string known to the C++ committee. 在C ++ 03中不能保证它,但是对于C ++委员会已知的所有std::string主动维护的实现都是正确的。 If performance really is important then I think it's reasonable to make that non-portable assumption, since writing directly into a string saves copying into a string later. 如果性能确实很重要,那么我认为做出这种不可移植的假设是合理的,因为直接写入字符串可以节省以后复制到字符串中的时间。

s.pop_back() is the C++11 way of saying s.erase(s.end()-1) , and s.back() is s[s.size()-1] s.pop_back()s.erase(s.end()-1)的C ++ 11表达方式,而s.back()s[s.size()-1]

For another possible improvement, you could get rid of the first call to snprintf and instead size your s to some value like std::numeric_limits<double>::max_exponent10 + 14 (basically, the length that -DBL_MAX needs). 对于另一个可能的改进,您可以摆脱对snprintf的第一次调用,而是将s大小调整为一些值,例如std::numeric_limits<double>::max_exponent10 + 14 (基本上是-DBL_MAX所需的长度)。 The trouble is that this allocates and zeros far more memory than is typically needed (322 bytes for an IEEE double). 麻烦的是,这将分配和归零的内存远远超过通常所需的内存(对于IEEE double,为322字节)。 My intuition is that this will be slower than the first call to snprintf , not to mention wasteful of memory in the case where the string return value is kept hanging around for a while by the caller. 我的直觉是,这将比对snprintf的第一次调用慢,更不用说在字符串返回值被调用方保持一段时间的情况下浪费内存了。 But you can always test it. 但是您可以随时对其进行测试。

Alternatively, std::max((int)std::log10(d), 0) + 14 computes a reasonably tight upper bound on the size needed, and might be quicker than snprintf can compute it exactly. 另外, std::max((int)std::log10(d), 0) + 14可计算所需大小的合理上限,并且可能比snprintf能够更快地精确计算。

Finally, it may be that you can improve performance by changing the function interface. 最后,可能是您可以通过更改功能界面来提高性能。 For example, instead of returning a new string you could perhaps append to a string passed in by the caller: 例如,除了返回新字符串,您还可以附加到调用方传递的字符串中:

void append_dbl2str(std::string &s, double d) {
    size_t len = std::snprintf(0, 0, "%.10f", d);
    size_t oldsize = s.size();
    s.resize(oldsize + len + 1);
    // technically non-portable
    std::snprintf(&s[oldsize], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
}

Then the caller can reserve() plenty of space, call your function several times (presumably with other string appends in between), and write the resulting block of data to the file all at once, without any memory allocation other than the reserve . 然后,调用者可以reserve()足够的空间,多次调用函数(可能在其间附加其他字符串),然后将结果数据块一次全部写入文件中,除了reserve之外没有任何内存分配。 "Plenty" doesn't have to be the whole file, it could be one line or "paragraph" at a time, but anything that avoids a zillion memory allocations is a potential performance boost. “大量”不必一定是整个文件,一次可以是一行或“一段”,但是避免任何数量的内存分配的任何事情都可能提高性能。

Efficient in terms of speed or brevity? 在速度或简洁性方面高效?

char buf[64];
sprintf(buf, "%-.*G", 16, 1.0);
cout << buf << endl;

Displays "1". 显示“ 1”。 Formats up to significant 16 digits, with no trailing zeros, before reverting to scientific notation. 在还原为科学计数法之前,最多可格式化有效的16位数字,且不带尾随零。

  • use snprintf and an array of char instead of stringstream and string 使用snprintf和一个char数组代替stringstreamstring
  • pass a pointer to char buffer to dbl2str into which it prints (in order to avoid the copy constructor of string called when returning). 将指向char缓冲区的指针传递到要打印到的dbl2str(以避免返回时调用的string的复制构造函数)。 Assemble the string to be printed in a character buffer (or convert the char buffer when called to a string or add it to an existing string) 将要打印的字符串组装到字符缓冲区中(或在调用时将char缓冲区转换为字符串或将其添加到现有字符串中)
  • declare the function inline in a header file 在头文件中inline声明函数

     #include <cstdio> inline void dbl2str(char *buffer, int bufsize, double d) { /** the caller must make sure that there is enough memory allocated for buffer */ int len = snprintf(buffer, bufsize, "%lf", d); /* len is the number of characters put into the buffer excluding the trailing \\0 so buffer[len] is the \\0 and buffer[len-1] is the last 'visible' character */ while (len >= 1 && buffer[len-1] == '0') --len; /* terminate the string where the last '0' character was or overwrite the existing 0 if there was no '0' */ buffer[len] = 0; /* check for a trailing decimal point */ if (len >= 1 && buffer[len-1] == '.') buffer[len-1] = 0; } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM