简体   繁体   中英

txt file parsing c++ in to vector more efficiently

My program uses ifstream() and getline() to parse a text file in to objects that are two vectors deep. ie vector inside vector. The inner vector contains over 250000 string objects once the text file is finished loading.

this is painfully slow. Is there an STD alternative that is more efficient than using ifstream() and getline() ?

Thanks

UPDATE:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>

using namespace std;

class Word
{
private:
    string moniker = "";
    vector <string> definition;
    string type = "";

public:
    void setMoniker(string m) { this->moniker = m; }
    void setDefinition(string d) { this->definition.push_back(d); }
    void setType(string t) { this->type = t; }
    int getDefinitionSize() { return this->definition.size(); }

    string getMoniker() { return this->moniker; }
    void printDefinition()
    {
        for (int i = 0; i < definition.size(); i++)
        {
            cout << definition[i] << endl;
        }

    }


    string getType() { return this->type; }
};

class Dictionary
{
private:
    vector<Word> Words;

public:
    void addWord(Word w) { this->Words.push_back(w); }
    Word getWord(int i) { return this->Words[i]; }
    int getTotalNumberOfWords() { return this->Words.size(); }
    void loadDictionary(string f)
    {
        const regex _IS_DEF("[\.]|[\ ]"),
            _IS_TYPE("^misc$|^n$|^adj$|^v$|^adv$|^prep$|^pn$|^n_and_v$"),
            _IS_NEWLINE("\n");

        string line;

        ifstream dict(f);

        string m, t, d = "";

        while (dict.is_open())
        {
            while (getline(dict, line))
            {
                if (regex_search(line, _IS_DEF))
                {
                    d = line;
                }
                else if (regex_search(line, _IS_TYPE))
                {
                    t = line;
                }
                else if (!(line == ""))
                {
                    m = line;
                }
                else
                {
                    Word w;
                    w.setMoniker(m);
                    w.setType(t);
                    w.setDefinition(d);
                    this->addWord(w);
                }
            }
            dict.close();
        }
    }
};



int main()
{
    Dictionary dictionary;
    dictionary.loadDictionary("dictionary.txt");
    return 0;
}

You should reduce your memory allocations. Having a vector of vectors is usually not a good idea, because every inner vector does its own new and delete .

You should reserve() the approximate number of elements you need in the vector at the start.

You should use fgets() if you don't actually need to extract std::string to get your work done. For example if the objects can be parsed from char arrays, do that. Make sure to read into the same string buffer every time, rather than creating new buffers.

And most important of all, use a profiler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM