简体   繁体   English

如何使用C ++更有效地摆脱停用词

[英]how to get rid of stopwords more effeciently using C++

Now I have a stopword dict,and the wordlist need to be processed.How can I coding to imporve much effeciently? 现在我有了一个停用词字典,需要处理单词表。如何编码才能提高效率呢?

My code is that:load dict into memory,using vectorstopword, and then iterator the wordlist,find if wordlist in stopword,if not in ,strcpy to newwordlist. 我的代码是:使用vectorstopword将dict加载到内存中,然后迭代单词列表,找到单词列表中是否包含停用词,如果不在单词中,则将其strcpy转换为newwordlist。

for(i=0;i<len;i++)
{  
   stopword.find(a[i])=stopword.end());
   strcpy(new_word,a[i]);
}  

But this code need to 2 loops?Can any other ways to imporve it?Using hash instead of vertor? 但是这段代码需要2个循环吗?还有其他方法可以改善它吗?使用哈希代替vertor吗?

You can store your stopwords in a structure called Trie. 您可以将停用词存储在名为Trie的结构中。 It is a prefix-based tree that will enable you to search for all stopwords at once, character-by-character. 这是一个基于前缀的树,使您可以逐个字符一次搜索所有停用词。

See Wikipedia: http://en.wikipedia.org/wiki/Trie 参见Wikipedia: http : //en.wikipedia.org/wiki/Trie

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM