简体   繁体   English

读取文件的最快方法C ++

[英]Fastest way to read in a file c++

I would like to read in a file like this: 我想读一个这样的文件:

13.3027 29.2191 2.39999
13.3606 29.1612 2.39999
13.3586 29.0953 2.46377
13.4192 29.106 2.37817

It has more than 1mio lines. 它具有1mio以上的线路。

My current cpp code is: 我当前的cpp代码是:

loadCloud(const string &filename, PointCloud<PointXYZ> &cloud)
{
    print_info("\nLoad the Cloud .... (this takes some time!!!) \n");
    ifstream fs;
    fs.open(filename.c_str(), ios::binary);
    if (!fs.is_open() || fs.fail())
    {
        PCL_ERROR(" Could not open file '%s'! Error : %s\n", filename.c_str(), strerror(errno));
        fs.close();
        return (false);
    }

    string line;
    vector<string> st;

    while (!fs.eof())
    {
        getline(fs, line);
        // Ignore empty lines
        if (line == "") 
        {
            std::cout << "  this line is empty...." << std::endl;
            continue;
        }

        // Tokenize the line
        boost::trim(line);
        boost::split(st, line, boost::is_any_of("\t\r "), boost::token_compress_on);

        cloud.push_back(PointXYZ(float(atof(st[0].c_str())), float(atof(st[1].c_str())), float(atof(st[2].c_str()))));
    }
    fs.close();
    std::cout<<"    Size of loaded cloud:   " << cloud.size()<<" points" << std::endl;
    cloud.width = uint32_t(cloud.size()); cloud.height = 1; cloud.is_dense = true;
    return (true);
}

Reading this file currently takes really long. 当前读取此文件需要很长时间。 I would like to speed this up any ideas how to do that? 我想加快这一步的任何想法怎么做?

You can just read the numbers instead of the whole line plus parsing, as long as the numbers always come in sets of three. 您可以只读取数字,而不是整行加解析,只要数字始终以三个为一组即可。

void readFile(const std::string& fileName)
{
    std::ifstream infile(fileName);

    float vertex[3];
    int coordinateCounter = 0;

    while (infile >> vertex[coordinateCounter])
    {
        coordinateCounter++;
        if (coordinateCounter == 3)
        {
            cloud.push_back(PointXYZ(vertex[0], vertex[1], vertex[2]));
            coordinateCounter = 0;
        }
    }
}

Are you running optimised code? 您是否正在运行优化的代码? On my machine your code reads a million values in 1800ms. 在我的机器上,您的代码在1800毫秒内读取了100万个值。

The trim and the split are probably taking most of the time. trimsplit可能要花费大部分时间。 If there is white space at the beginning of the string trim has to copy the whole string contents to erase the first characters. 如果字符串开头有空白,则trim必须复制整个字符串内容以擦除第一个字符。 split is creating new string copies, you can optimise this by using string_view to avoid the copies. split正在创建新的字符串副本,您可以通过使用string_view来避免副本来string_view进行优化。

As your separators are white space you can avoid all the copies with code like this: 由于分隔符是空白,因此可以避免使用以下代码进行所有复制:

bool loadCloud(const string &filename, std::vector<std::array<float, 3>> &cloud)
{
    ifstream fs;
    fs.open(filename.c_str(), ios::binary);
    if (!fs)
    {
        fs.close();
        return false;
    }

    string line;
    vector<string> st;

    while (getline(fs, line))
    {
        // Ignore empty lines
        if (line == "")
        {
            continue;
        }

        const char* first = &line.front();
        const char* last = first + line.length();
        std::array<float, 3> arr;
        for (float& f : arr)
        {
            auto result = std::from_chars(first, last, f);
            if (result.ec != std::errc{})
            {
                return false;
            }
            first = result.ptr;
            while (first != last && isspace(*first))
            {
                first++;
            }
        }
        if (first != last)
        {
            return false;
        }

        cloud.push_back(arr);
    }
    fs.close();
    return true;
}

On my machine this code runs in 650ms. 在我的机器上,此代码在650毫秒内运行。 About 35% of the time is used by getline , 45% by parsing the floats, the remaining 20% is used by push_back . 大约35%的时间用于getline ,45%的时间用于解析float,其余20%的时间用于push_back

A few notes: 一些注意事项:

  1. I've fixed the while(!fs.eof()) issue by checking the state of the stream after calling getline 我通过调用getline之后检查流的状态来解决while(!fs.eof())问题
  2. I've changed the result to an array as your example wasn't a mcve so I didn't have a definition of PointCloud or PointXYZ , its possible that these types are the cause of your slowness. 我将结果更改为数组,因为您的示例不是mcve,所以我没有PointCloudPointXYZ的定义,这些类型可能是导致速度慢的原因。
  3. If you know the number of lines (or at least an approximation) in advance then reserving the size of the vector would improve performance 如果您事先知道行数(或至少近似值),则保留向量的大小将提高性能

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM