简体   繁体   English

txt文件将c ++解析为更有效的矢量

[英]txt file parsing c++ in to vector more efficiently

My program uses ifstream() and getline() to parse a text file in to objects that are two vectors deep. 我的程序使用ifstream()和getline()将文本文件解析为两个向量深的对象。 ie vector inside vector. 即向量里面的向量。 The inner vector contains over 250000 string objects once the text file is finished loading. 文本文件加载完成后,内部向量包含超过250000个字符串对象。

this is painfully slow. 这太慢了。 Is there an STD alternative that is more efficient than using ifstream() and getline() ? 是否有一种STD替代方法比使用ifstream()和getline()更有效?

Thanks 谢谢

UPDATE: 更新:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>

using namespace std;

class Word
{
private:
    string moniker = "";
    vector <string> definition;
    string type = "";

public:
    void setMoniker(string m) { this->moniker = m; }
    void setDefinition(string d) { this->definition.push_back(d); }
    void setType(string t) { this->type = t; }
    int getDefinitionSize() { return this->definition.size(); }

    string getMoniker() { return this->moniker; }
    void printDefinition()
    {
        for (int i = 0; i < definition.size(); i++)
        {
            cout << definition[i] << endl;
        }

    }


    string getType() { return this->type; }
};

class Dictionary
{
private:
    vector<Word> Words;

public:
    void addWord(Word w) { this->Words.push_back(w); }
    Word getWord(int i) { return this->Words[i]; }
    int getTotalNumberOfWords() { return this->Words.size(); }
    void loadDictionary(string f)
    {
        const regex _IS_DEF("[\.]|[\ ]"),
            _IS_TYPE("^misc$|^n$|^adj$|^v$|^adv$|^prep$|^pn$|^n_and_v$"),
            _IS_NEWLINE("\n");

        string line;

        ifstream dict(f);

        string m, t, d = "";

        while (dict.is_open())
        {
            while (getline(dict, line))
            {
                if (regex_search(line, _IS_DEF))
                {
                    d = line;
                }
                else if (regex_search(line, _IS_TYPE))
                {
                    t = line;
                }
                else if (!(line == ""))
                {
                    m = line;
                }
                else
                {
                    Word w;
                    w.setMoniker(m);
                    w.setType(t);
                    w.setDefinition(d);
                    this->addWord(w);
                }
            }
            dict.close();
        }
    }
};



int main()
{
    Dictionary dictionary;
    dictionary.loadDictionary("dictionary.txt");
    return 0;
}

You should reduce your memory allocations. 您应该减少您的内存分配。 Having a vector of vectors is usually not a good idea, because every inner vector does its own new and delete . 拥有向量的向量通常不是一个好主意,因为每个内部向量都执行其自己的newdelete

You should reserve() the approximate number of elements you need in the vector at the start. 您应该在一开始就reserve()向量中所需的大致元素数量。

You should use fgets() if you don't actually need to extract std::string to get your work done. 如果实际上不需要提取std::string来完成工作,则应该使用fgets() For example if the objects can be parsed from char arrays, do that. 例如,如果可以从char数组中解析对象,请执行此操作。 Make sure to read into the same string buffer every time, rather than creating new buffers. 确保每次都读取相同的字符串缓冲区,而不是创建新的缓冲区。

And most important of all, use a profiler. 最重要的是,使用探查器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM