简体   繁体   中英

Fastest way to read in a file c++

I would like to read in a file like this:

13.3027 29.2191 2.39999
13.3606 29.1612 2.39999
13.3586 29.0953 2.46377
13.4192 29.106 2.37817

It has more than 1mio lines.

My current cpp code is:

loadCloud(const string &filename, PointCloud<PointXYZ> &cloud)
{
    print_info("\nLoad the Cloud .... (this takes some time!!!) \n");
    ifstream fs;
    fs.open(filename.c_str(), ios::binary);
    if (!fs.is_open() || fs.fail())
    {
        PCL_ERROR(" Could not open file '%s'! Error : %s\n", filename.c_str(), strerror(errno));
        fs.close();
        return (false);
    }

    string line;
    vector<string> st;

    while (!fs.eof())
    {
        getline(fs, line);
        // Ignore empty lines
        if (line == "") 
        {
            std::cout << "  this line is empty...." << std::endl;
            continue;
        }

        // Tokenize the line
        boost::trim(line);
        boost::split(st, line, boost::is_any_of("\t\r "), boost::token_compress_on);

        cloud.push_back(PointXYZ(float(atof(st[0].c_str())), float(atof(st[1].c_str())), float(atof(st[2].c_str()))));
    }
    fs.close();
    std::cout<<"    Size of loaded cloud:   " << cloud.size()<<" points" << std::endl;
    cloud.width = uint32_t(cloud.size()); cloud.height = 1; cloud.is_dense = true;
    return (true);
}

Reading this file currently takes really long. I would like to speed this up any ideas how to do that?

You can just read the numbers instead of the whole line plus parsing, as long as the numbers always come in sets of three.

void readFile(const std::string& fileName)
{
    std::ifstream infile(fileName);

    float vertex[3];
    int coordinateCounter = 0;

    while (infile >> vertex[coordinateCounter])
    {
        coordinateCounter++;
        if (coordinateCounter == 3)
        {
            cloud.push_back(PointXYZ(vertex[0], vertex[1], vertex[2]));
            coordinateCounter = 0;
        }
    }
}

Are you running optimised code? On my machine your code reads a million values in 1800ms.

The trim and the split are probably taking most of the time. If there is white space at the beginning of the string trim has to copy the whole string contents to erase the first characters. split is creating new string copies, you can optimise this by using string_view to avoid the copies.

As your separators are white space you can avoid all the copies with code like this:

bool loadCloud(const string &filename, std::vector<std::array<float, 3>> &cloud)
{
    ifstream fs;
    fs.open(filename.c_str(), ios::binary);
    if (!fs)
    {
        fs.close();
        return false;
    }

    string line;
    vector<string> st;

    while (getline(fs, line))
    {
        // Ignore empty lines
        if (line == "")
        {
            continue;
        }

        const char* first = &line.front();
        const char* last = first + line.length();
        std::array<float, 3> arr;
        for (float& f : arr)
        {
            auto result = std::from_chars(first, last, f);
            if (result.ec != std::errc{})
            {
                return false;
            }
            first = result.ptr;
            while (first != last && isspace(*first))
            {
                first++;
            }
        }
        if (first != last)
        {
            return false;
        }

        cloud.push_back(arr);
    }
    fs.close();
    return true;
}

On my machine this code runs in 650ms. About 35% of the time is used by getline , 45% by parsing the floats, the remaining 20% is used by push_back .

A few notes:

  1. I've fixed the while(!fs.eof()) issue by checking the state of the stream after calling getline
  2. I've changed the result to an array as your example wasn't a mcve so I didn't have a definition of PointCloud or PointXYZ , its possible that these types are the cause of your slowness.
  3. If you know the number of lines (or at least an approximation) in advance then reserving the size of the vector would improve performance

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM