简体   繁体   English

如何在C ++中编译具有大量数据的向量?

[英]How to compile a vector with huge amount of data in C++?

I am writting a C++ program which checks if some words exist in Catalan, so I have a vector with the Catalan dictionary: 我正在编写一个C ++程序,该程序检查加泰罗尼亚语中是否存在某些单词,因此我在加泰罗尼亚语词典中有一个向量:

const vector<string> dict={"aaron","ababol","abac","abaca","abacallanada","abacallanava","abacas","abacial", ... ,"zum-zum","zur","zuric","zwitterio"};

The problem is that the dictionary has 107776 entries, so when I attempt to compile the file: 问题在于字典有107776个条目,因此当我尝试编译文件时:

g++ -Wall file.cc -std=c++0x -o file.exe

it does nothing during a while and then Windows says that it isn't responding and closes it. 它会在一段时间内什么也不做,然后Windows表示它没有响应并关闭它。

How can I compile it? 我该如何编译? Is there a better way of storing this type of data (arrays, ...)? 有没有更好的方法来存储这种类型的数据(数组,...)?

You may well have more luck with old-school built-in arrays: 老式内置数组可能会给您带来更多的运气:

char const * const dict[] = {"aaron",...};

This will generate a load of string literals and an array of pointers to them, which shouldn't be too much of a strain for the compiler. 这将产生大量的字符串文字和指向它们的指针数组,这对于编译器来说应该不会太大。 This will also use no more memory than necessary, with little or no work at runtime. 这也不会使用过多的内存,而在运行时几乎不需要工作。

Alternatively, std::array<char const *> should be just as efficient, with more of a C++ look and feel. 另外, std::array<char const *>应该同样高效,并且具有更多的C ++外观。

Your version also has to generate an enormous amount of code to build an initializer_list from those, construct a string from each, and add each string to the vector. 您的版本还必须生成大量代码,才能从这些代码构建一个initializer_list ,从每个代码构造一个字符串,并将每个字符串添加到向量中。 It will also require more than twice as much memory as each string literal needs to be copied into memory allocated at runtime, and then all those pointers need to be stored in another run-time allocated array. 与将每个字符串文字复制到运行时分配的内存中相比,它还需要两倍多的内存,然后所有这些指针都需要存储在另一个运行时分配的数组中。

The disadvantage is that you may end up constructing a temporary string each time you read from the dictionary. 缺点是,每次您从词典中读取时,都可能最终构造一个临时字符串。 If that's a concern, then an array of std::string might be a reasonable compromise. 如果这是一个问题,那么std::string数组可能是一个合理的折衷方案。

Store it in external file, and load on demand. 将其存储在外部文件中,然后按需加载。 This is the best solution, otherwise I suppose you should split your vector into multiple vectors and maybe put them into separate cpp files. 这是最好的解决方案,否则我想您应该将向量拆分为多个向量,并可能将它们放入单独的cpp文件中。

Store the dictionary in a text file, one word per line. 将字典存储在文本文件中,每行一个单词。 Then add this code to your program: 然后将此代码添加到您的程序中:

{ 
  std::string inputFileName;
  std::ifstream inputFile(inputFileName);
  std::string word;
  while( std::getline(inputFile, word) )
    dict.push_back(word);
}

Would it be possible to load only a single set of the dictionary from file using methods in other answers, ie load only "a" words from file a.dic . 是否有可能使用其他答案中的方法从文件中仅加载字典的单个集合,即仅从文件a.dic加载“ a”个单词。 Or do you need to have access to the entire dictionary at once? 还是您需要一次访问整个词典?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM