TCLAP使多线程程序变慢

Question

TCLAP is a C++ templatized header-only library for parsing command-line arguments. TCLAP是C ++模板化的仅标头库，用于解析命令行参数。

I'm using TCLAP to process command-line arguments in a multi-threaded program: the arguments are read in the main function, then multiple threads are initiated to work on the task defined by the arguments (some parameters for an NLP task). 我正在使用TCLAP处理多线程程序中的命令行参数：在主函数中读取参数，然后启动多个线程以处理参数定义的任务（NLP任务的某些参数）。

I've started showing the amount of words per second processed by the threads, and I've found that if I hard-code the arguments into the main instead of reading them from the cli using TCLAP, the throughput is 6 times faster! 我已经开始显示线程每秒处理的单词数量，而且我发现，如果我将参数硬编码到main中，而不是使用TCLAP从cli中读取它们，那么吞吐量将提高6倍！

I'm using gcc with the -O2 argument, with which I see speed increases of about 10 times over not optimizing during compilation (when TCLAP is not being used)... so it seems that using TCLAP somehow negates part of the advantage of compiler optimization. 我正在将gcc与-O2参数一起使用，与在编译过程中未进行优化时（未使用TCLAP的情况）相比，我看到的速度提高了大约10倍。因此，似乎使用TCLAP会以某种方式抵消部分优势编译器优化。

Here's what the main function, the only place I use TCLAP, looks like: 这是我使用TCLAP的唯一功能，主要功能如下所示：

int main(int argc, char** argv)                                                 
{                                                                               
uint32_t mincount;                                                          
uint32_t dim;                                                               
uint32_t contexthalfwidth;                                                  
uint32_t negsamples;                                                        
uint32_t numthreads;                                                        
uint32_t randomseed;                                                        
string corpus_fname;                                                        
string output_basefname;                                                    
string vocab_fname;                                                         

Eigen::initParallel();                                                      

try {                                                                       
TCLAP::CmdLine cmd("Driver for various word embedding models", ' ', "0.1"); 
TCLAP::ValueArg<uint32_t> dimArg("d","dimension","dimension of word representations",false,300,"uint32_t");
TCLAP::ValueArg<uint32_t> mincountArg("m", "mincount", "required minimum occurrence count to be added to vocabulary",false,5,"uint32_t");
TCLAP::ValueArg<uint32_t> contexthalfwidthArg("c", "contexthalfwidth", "half window size of a context frame",false,15,"uint32_t");
TCLAP::ValueArg<uint32_t> numthreadsArg("t", "numthreads", "number of threads",false,12,"uint32_t");
TCLAP::ValueArg<uint32_t> negsamplesArg("n", "negsamples", "number of negative samples for skipgram model",false,15,"uint32_t");
TCLAP::ValueArg<uint32_t> randomseedArg("s", "randomseed", "seed for random number generator",false,2014,"uint32_t");
TCLAP::UnlabeledValueArg<string> corpus_fnameArg("corpusfname", "file containing the training corpus, one paragraph or sentence per line", true, "corpus", "corpusfname");
TCLAP::UnlabeledValueArg<string> output_basefnameArg("outputbasefname", "base filename for the learnt word embeddings", true, "wordreps-", "outputbasefname");
TCLAP::ValueArg<string> vocab_fnameArg("v", "vocabfname", "filename for the vocabulary and word counts", false, "wordsandcounts.txt", "filename");
cmd.add(dimArg);                                                            
cmd.add(mincountArg);                                                       
cmd.add(contexthalfwidthArg);                                               
cmd.add(numthreadsArg);                                                     
cmd.add(randomseedArg);                                                     
cmd.add(corpus_fnameArg);                                                   
cmd.add(output_basefnameArg);                                               
cmd.add(vocab_fnameArg);                                                    
cmd.parse(argc, argv);                                                      

mincount = mincountArg.getValue();                                          
dim = dimArg.getValue();                                                    
contexthalfwidth = contexthalfwidthArg.getValue();                          
negsamples = negsamplesArg.getValue();                                      
numthreads = numthreadsArg.getValue();                                      
randomseed = randomseedArg.getValue();                                      
corpus_fname = corpus_fnameArg.getValue();                                  
output_basefname = output_basefnameArg.getValue();                          
vocab_fname = vocab_fnameArg.getValue();                                    
}                                                                           
catch (TCLAP::ArgException &e) {};         

/*                                                                          
uint32_t mincount = 5;                                                      
uint32_t dim = 50;                                                          
uint32_t contexthalfwidth = 15;                                             
uint32_t negsamples = 15;                                                   
uint32_t numthreads = 10;                                                   
uint32_t randomseed = 2014;                                                 
string corpus_fname = "imdbtrain.txt";                                      
string output_basefname = "wordreps-";                                      
string vocab_fname = "wordsandcounts.txt";                                  
*/                                                                          

string test_fname = "imdbtest.txt";                                         
string output_fname = "parreps.txt";                                        
string countmat_fname = "counts.hdf5";                                      
Vocabulary * vocab;                                                                                                              

vocab = determineVocabulary(corpus_fname, mincount);                        
vocab->dump(vocab_fname);                                                   

Par2VecModel p2vm = Par2VecModel(corpus_fname, vocab, dim, contexthalfwidth, negsamples, randomseed);
p2vm.learn(numthreads);                                                     
p2vm.save(output_basefname);                                                
p2vm.learnparreps(test_fname, output_fname, numthreads); 

}

The only place multithreading is used is in the Par2VecModel::learn function: 使用多线程的唯一地方是在Par2VecModel :: learn函数中：

void Par2VecModel::learn(uint32_t numthreads) {                                 
thread* workers;                                                            
workers = new thread[numthreads];                                           
uint64_t numwords = 0;                                                      
bool killflag = 0;                                                          
uint32_t randseed;                                                          

ifstream filein(corpus_fname.c_str(), ifstream::ate | ifstream::binary);    
uint64_t filesize = filein.tellg();                                         

fprintf(stderr, "Total number of in vocab words to train over: %u\n", vocab->gettotalinvocabwords());

for(uint32_t idx = 0; idx < numthreads; idx++) {                            
    randseed = eng();                                                       
    workers[idx] = thread(skipgram_training_thread, this, numthreads, idx, filesize, randseed, std::ref(numwords));
}                                                                           

thread monitor(monitor_training_thread, this, numthreads, std::ref(numwords), std::ref(killflag));

for(uint32_t idx = 0; idx < numthreads; idx++)                              
    workers[idx].join();                                                    

killflag = true;                                                            
monitor.join();                                                             
}

This section does not involve TCLAP at all, so what's going on? 本节完全不涉及TCLAP，所以怎么回事？ (I'm also using c++11 features, so have the -std=c++11 flag, if that makes a difference) （我也在使用c ++ 11功能，因此-std = c ++ 11标志，如果有区别的话）

Answer 1

So this has been open for a long time, and this suggestion may no longer be useful, but I'd first check what happens if you replace TCLAP with a "simple" parser (ie just feed the arguments in on the command line in a specific fixed order and convert them to the right type). 因此，它已经开放了很长时间，并且该建议可能不再有用，但是我首先要检查如果用“简单”解析器替换TCLAP会发生什么情况（即，仅在命令行中输入参数特定的固定顺序并将其转换为正确的类型）。 It's highly unlikely that the issue is due to TCLAP (ie I can't imagine any mechanism for such behavior). 问题极不可能是由TCLAP引起的（即，我无法想象这种行为的任何机制）。 However, it's conceivable that with hard-coded values, the compiler is able to do some compile-time optimizations that aren't possible when those values must be variables. 但是，可以想象的是，对于硬编码的值，编译器能够执行一些编译时优化，而这些值必须是变量时是不可能的。 However, the degree of performance difference seems somewhat pathological, so I'm still skeptical that there's not something else going on. 但是，性能差异的程度似乎有些病态，因此我仍然怀疑没有其他事情在发生。

TCLAP使多线程程序变慢

问题描述

1 个解决方案

解决方案1
0 2015-07-13 02:54:35

TCLAP使多线程程序变慢

问题描述

1 个解决方案

解决方案1 0 2015-07-13 02:54:35

解决方案1
0 2015-07-13 02:54:35