简体   繁体   中英

How can I efficiently search a string for occurrences of words?

Essentially, I have a Set of words, about 250,000 of them and want to be able to return a list of which ones are found in a given string.

eg. input string is 'APPLEASEDITION', I want to return

[APP,APPLE,PLEA, PLEAS,PLEASE,PLEASED,lEA,LEAS,LEASE,LEASED,EA,EAS,EASE,EASED,AS,SEDITION,EDITION,IT,TI,ON]

I came up with this code, which works faster than the method mentioned above for shorter input strings (up to 15 characters), but doubles in execution time with each added letter:

const findWords = (instring, solutions = null) => {
  if (!solutions) solutions = new Set();
  if (!instring) {
    return new Set();
  }
  if (words[instring]) {
    solutions.add(instring);
  }
  const suffix = instring.slice(1);
  const prefix = instring.slice(0, instring.length - 1);

  if (!solutions.has(prefix))
    solutions = new Set([...solutions, ...findWords(prefix, solutions)]);
  if (!solutions.has(suffix))
    solutions = new Set([...solutions, ...findWords(suffix, solutions)]);
  return solutions;
};

Wondering if anyone can help me out optimizing the code?

  1. As it stands your logic assumes your input starts or ends with the phrase, but doesn't consider words in the middle - you'll need to generate permutations

  2. Convert your dictionary to a hash where the words are keys - O(n) => O(1) - you can check if possible words are in the dictionary by checking dictionary[possibleWord]

  3. You could convert your array of dictionary words into a binary search tree or a trie - there might be a performance benefit to converting your source text to a collection of BSTs/Tries, where each one represents a possible word/permutation, and then comparing BSTs/Tries rather than strings, but I'm not sure how that'd be faster than string comparison at the moment.

You can limit the length to the max length of a given permutation to the words in your dictionary. You'll end up with a lot of permutations, but possibly less than you have currently.

As the comments state you may want to do this server side for more power/in a language more efficient than JS, or using WASM.

Some javascript libraries that have binary search tree tools:

  1. Alternatively, you might be able to create two hashes (one of permutations, one of dictionary words), or another data structure that's made for creating a "diff" or "overlap", and extract the keys that are in both sets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM