简体   繁体   中英

Which of these two solutions is more efficient? (Java Hashset)

I have a text file full of words. I want to add each of these words to a hashset. I also have a hashset of words I do not want.

Is it more efficient to:

  • (A) Add all the words to the hashset I want and remove the hashset of words I do not want at the end.
  • (B) Check if each word is in the hashset of words I do not want and if it is, ignore it. If it is not then add it to the set of words I do want.

Edit
There is far more words I want, than words I do not want.

The answer depends completely on the size of your lists. If you have 99999 words you don't want and 1 word you do, you should do option A. If you have 99999 words you want and 1 word you don't, you should do option B.

The reason behind this is obvious - option B gets more and more efficient the smaller the hash set of undesired words is since you have to check that entire set any time you insert a new word using option B.

From a purely theoretical view, both are the same in terms of worst case time complexity, but practically, there can be a big difference.

So basically, as with most solutions, the efficiency depends on how you expect your data to be structured.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM