如何使用C ++更有效地摆脱停用词

Question

Now I have a stopword dict,and the wordlist need to be processed.How can I coding to imporve much effeciently? 现在我有了一个停用词字典，需要处理单词表。如何编码才能提高效率呢？

My code is that:load dict into memory,using vectorstopword, and then iterator the wordlist,find if wordlist in stopword,if not in ,strcpy to newwordlist. 我的代码是：使用vectorstopword将dict加载到内存中，然后迭代单词列表，找到单词列表中是否包含停用词，如果不在单词中，则将其strcpy转换为newwordlist。

for(i=0;i<len;i++)
{  
   stopword.find(a[i])=stopword.end());
   strcpy(new_word,a[i]);
}

But this code need to 2 loops?Can any other ways to imporve it?Using hash instead of vertor? 但是这段代码需要2个循环吗？还有其他方法可以改善它吗？使用哈希代替vertor吗？

Answer 1

You can store your stopwords in a structure called Trie. 您可以将停用词存储在名为Trie的结构中。 It is a prefix-based tree that will enable you to search for all stopwords at once, character-by-character. 这是一个基于前缀的树，使您可以逐个字符一次搜索所有停用词。

See Wikipedia: http://en.wikipedia.org/wiki/Trie 参见Wikipedia： http : //en.wikipedia.org/wiki/Trie

如何使用C ++更有效地摆脱停用词

问题描述

1 个解决方案

解决方案1
0 2013-12-10 03:58:31

如何使用C ++更有效地摆脱停用词

问题描述

1 个解决方案

解决方案1 0 2013-12-10 03:58:31

解决方案1
0 2013-12-10 03:58:31