简体   繁体   English

在C ++中改进/优化文件写入速度

[英]Improving/optimizing file write speed in C++

I've been running into some issues with writing to a file - namely, not being able to write fast enough. 我一直遇到写文件的一些问题 - 即无法写得足够快。

To explain, my goal is to capture a stream of data coming in over gigabit Ethernet and simply save it to a file. 为了解释,我的目标是捕获通过千兆以太网传输的数据流,并将其保存到文件中。

The raw data is coming in at a rate of 10MS/s, and it's then saved to a buffer and subsequently written to a file. 原始数据以10MS / s的速率进入,然后将其保存到缓冲区并随后写入文件。

Below is the relevant section of code: 以下是相关的代码部分:

    std::string path = "Stream/raw.dat";
    ofstream outFile(path, ios::out | ios::app| ios::binary);

    if(outFile.is_open())
        cout << "Yes" << endl;

    while(1)
    {
         rxSamples = rxStream->recv(&rxBuffer[0], rxBuffer.size(), metaData);
         switch(metaData.error_code)
         {

             //Irrelevant error checking...

             //Write data to a file
                std::copy(begin(rxBuffer), end(rxBuffer), std::ostream_iterator<complex<float>>(outFile));
         }
    } 

The issue I'm encountering is that it's taking too long to write the samples to a file. 我遇到的问题是将样本写入文件需要很长时间。 After a second or so, the device sending the samples reports its buffer has overflowed. 大约一秒钟后,发送样本的设备报告其缓冲区已溢出。 After some quick profiling of the code, nearly all of the execution time is spent on std::copy(...) (99.96% of the time to be exact). 在对代码进行一些快速分析之后,几乎所有的执行时间都花费在std::copy(...) (确切地说是99.96%)。 If I remove this line, I can run the program for hours without encountering any overflow. 如果我删除这一行,我可以运行该程序几个小时而不会遇到任何溢出。

That said, I'm rather stumped as to how I can improve the write speed. 也就是说,我对如何提高写入速度感到困惑。 I've looked through several posts on this site, and it seems like the most common suggestion (in regard to speed) is to implement file writes as I've already done - through the use of std::copy . 我查看了这个网站上的几个帖子,看起来最常见的建议(关于速度)是通过使用std::copy来实现我已经完成的文件写入。

If it's helpful, I'm running this program on Ubuntu x86_64. 如果它有用,我在Ubuntu x86_64上运行这个程序。 Any suggestions would be appreciated. 任何建议,将不胜感激。

So the main problem here is that you try to write in the same thread as you receive, which means that your recv() can only be called again after copy is complete. 所以这里的主要问题是你尝试在收到的同一个线程中写入,这意味着你的recv()只能在复制完成后再次调用。 A few observations: 一些观察:

  • Move the writing to a different thread. 将写入移动到另一个线程。 This is about a USRP, so GNU Radio might really be the tool of your choice -- it's inherently multithreaded. 这是关于USRP的,所以GNU Radio可能真的是你选择的工具 - 它本身就是多线程的。
  • Your output iterator is probably not the most performant solution. 您的输出迭代器可能不是最高性能的解决方案。 Simply "write()" to a file descriptor might be better, but that's performance measurements that are up to you 简单地“写()”到文件描述符可能会更好,但这是由你决定的性能测量
  • If your hard drive/file system/OS/CPU aren't up to the rates coming in from the USRP, even if decoupling receiving from writing thread-wise, then there's nothing you can do -- get a faster system. 如果您的硬盘驱动器/文件系统/ OS / CPU达不到USRP的速率,即使将接收与线程写入分离,那么您无能为力 - 获得更快的系统。
  • Try writing to a RAM disk instead 尝试写入RAM磁盘

In fact, I don't know how you came up with the std::copy approach. 事实上,我不知道你是如何想出std::copy方法的。 The rx_samples_to_file example that comes with UHD does this with a simple write, and you should definitely favor that over copying; UHD附带rx_samples_to_file示例通过简单的写入完成此操作,您绝对应该支持复制; file I/O can, on good OSes, often be done with one copy less, and iterating over all elements is probably very slow. 在优秀的操作系统上,文件I / O通常可以减少一个副本,并且迭代所有元素可能非常慢。

Let's do a bit of math. 我们来做一些数学运算。

Your samples are (apparently) of type std::complex<std::float> . 您的样本(显然)类型为std::complex<std::float> Given a (typical) 32-bit float, that means each sample is 64 bits. 给定(典型的)32位浮点数,这意味着每个样本是64位。 At 10 MS/s, that means the raw data is around 80 megabytes per second--that's within what you can expect to write to a desktop (7200 RPM) hard drive, but getting fairly close to the limit (which is typically around 100-100 megabytes per second or so). 在10 MS / s时,这意味着原始数据大约是每秒80兆字节 - 这可以达到您可以期望写入桌面(7200 RPM)硬盘的范围,但是接近极限(通常大约为100)每秒-100兆字节左右)。

Unfortunately, despite the std::ios::binary , you're actually writing the data in text format (because std::ostream_iterator basically does stream << data; ). 不幸的是,尽管有std::ios::binary ,你实际上是以文本格式编写数据(因为std::ostream_iterator基本上是stream << data; )。

This not only loses some precision, but increases the size of the data, at least as a rule. 这不仅会失去一些精确度,而且会增加数据的大小,至少通常是这样。 The exact amount of increase depends on the data--a small integer value can actually decrease the quantity of data, but for arbitrary input, a size increase close to 2:1 is fairly common. 确切的增加量取决于数据 - 小的整数值实际上可以减少数据量,但对于任意输入,大小增加接近2:1是相当普遍的。 With a 2:1 increase, your outgoing data is now around 160 megabytes/second--which is faster than most hard drives can handle. 随着2:1的增加,您的传出数据现在大约为160兆字节/秒 - 这比大多数硬盘驱动器可以处理的速度快。

The obvious starting point for an improvement would be to write the data in binary format instead: 改进的明显起点是以二进制格式编写数据:

uint32_t nItems = std::end(rxBuffer)-std::begin(rxBuffer);
outFile.write((char *)&nItems, sizeof(nItems));
outFile.write((char *)&rxBuffer[0], sizeof(rxBuffer));

For the moment I've used sizeof(rxBuffer) on the assumption that it's a real array. 目前我使用sizeof(rxBuffer)假设它是一个真正的数组。 If it's actually a pointer or vector, you'll have to compute the correct size (what you want is the total number of bytes to be written). 如果它实际上是指针或向量,则必须计算正确的大小(您想要的是要写入的总字节数)。

I'd also note that as it stands right now, your code has an even more serious problem: since it hasn't specified a separator between elements when it writes the data, the data will be written without anything to separate one item from the next. 我还注意到,正如它现在所说,你的代码有一个更严重的问题:因为它在写入数据时没有在元素之间指定分隔符,所以数据将被写入而没有任何东西将一个项目与下一个。 That means if you wrote two values of (for example) 1 and 0.2 , what you'd read back in would not be 1 and 0.2 , but a single value of 10.2 . 这意味着如果你写了两个(例如) 10.2 ,你读回的内容不会是10.2 ,而是单个值10.2 Adding separators to your text output will add yet more overhead (figure around 15% more data) to a process that's already failing because it generates too much data. 将分隔符添加到文本输出将为已经失败的进程增加更多开销(大约多15%的数据),因为它会生成太多数据。

Writing in binary format means each float will consume precisely 4 bytes, so delimiters are not necessary to read the data back in correctly. 以二进制格式写入意味着每个浮点数将精确消耗4个字节,因此分隔符不必正确读取数据。

The next step after that would be to descend to a lower-level file I/O routine. 之后的下一步是下降到较低级别的文件I / O例程。 Depending on the situation, this might or might not make much difference. 根据具体情况,这可能会或可能不会产生太大影响。 On Windows, you can specify FILE_FLAG_NO_BUFFERING when you open a file with CreateFile . 在Windows上,您可以在使用CreateFile打开文件时指定FILE_FLAG_NO_BUFFERING This means that reads and writes to that file will basically bypass the cache and go directly to the disk. 这意味着对该文件的读取和写入将基本绕过缓存并直接转到磁盘。

In your case, that's probably a win--at 10 MS/s, you're probably going to use up the cache space quite a while before you reread the same data. 在你的情况下,这可能是一个胜利 - 在10 MS / s时,你可能会在重读相同的数据之前花费一段时间来使用缓存空间。 In such a case, letting the data go into the cache gains you virtually nothing, but costs you some data to copy data to the cache, then somewhat later copy it out to the disk. 在这种情况下,让数据进入缓存几乎不会带来任何好处,但是会花费一些数据将数据复制到缓存,然后稍后将其复制到磁盘。 Worse, it's likely to pollute the cache with all this data, so it's no longer storing other data that's a lot more likely to benefit from caching. 更糟糕的是,它可能会使用所有这些数据污染缓存,因此它不再存储更有可能从缓存中受益的其他数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM