简体   繁体   中英

Find all words in dictionary given a string of words

I am attempting to write a program that will find all the words that can be constructed from it using a dictionary which has been loaded into an arrayList from a file . sowpodsList is the dictionary stored as an arrayList . I want to iterate through each word in the dictionary and then compare it to the string . Being that the string is just a random collection of words how do I go about achieving this ?

Input: asdm

Output: a, mad, sad .... (any word which matches in the dictionary.)

for (int i = 0; i < sowpodsList.size(); i++) {
    for (int j = 0; j < sowpodsList.get(i).length(); j++) {
        if (sowpodsList.get(i).charAt(j) ==   )
            ;
    }
}

You can search if the count of each character of each word in the dictionary is equal to the input's character count.

        ArrayList <String> matches = new ArrayList <String> ();

        // for each word in dict
        for(String word : sowpodsList) {

            // match flag
            Boolean nonMatch = true;

            // for each character of dict word
            for( char chW : word.toCharArray() ) {

                String w = Character.toString(chW);

                // if the count of chW in word is equal to its count in input, 
                // then, they are match
                if ( word.length() - word.replace(w, "").length() !=
                    input.length() - input.replace(w, "").length() ) {
                    nonMatch = false;
                    break;
                }
            }
            if (nonMatch) {
               matches.add( word );
            }
        }

        System.out.println(matches);

Sample output: (dict file I used is here: https://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt )

Input: asdm
Matches: [ad, ads, am, as, dam, dams, ma, mad, mads, mas, sad]

If I were you I'd change the way you store your dictionary.

Given that the string input has random letters in it, what I'd do here is store all words of your dictionary in a SortedMap<String, char[]> (a TreeMap , to be precise) where the keys are the words in your dictionary and the values are characters in this word sorted .

Then I'd sort the characters in the input string as well and go for that (pseudo code, not tested):

public Set<String> getMatchingWords(final String input)
{
    final char[] contents = input.toCharArray();
    Arrays.sort(contents);
    final int inputLength = contents.length;

    final Set<String> matchedWords = new HashSet<>();

    char[] candidate;
    int len;
    int matched;


    for (final Map.Entry<String, char[]> entry: dictionary.entrySet()) {
        candidate = entry.getValue();
        // If the first character of the candidate is greater
        // than the first character of the contents, no need
        // to continue (recall: the dictionary is sorted)
        if (candidate[0] > contents[0])
            break;
        // If the word has a greater length than the input,
        // go for the next word
        len = candidate.length;
        if (len > inputLength)
            continue;
        // Compare character by character
        for (matched = 0; matched < len; matched++)
            if (candidate[matched] != contents[matched])
                break;
        // We only add a match if the number of matched characters
        // is exactly that of the candidate
        if (matched == len)
            matchedWords.add(entry.getKey());
    }

    return matchedWords;
}


private static int commonChars(final char[] input, final char[] candidate)
{
    final int len = Math.min(input.length, candidate.length);
    int ret = 0;
    for (int i = 0; i < len; i++) {
        if (input[i] != candidate[i])
            break;
        ret++;
    }
    return ret;
}

With a trie : that would also be possible; whether it is practical or not however is another question, it depends on the size of the dictionary.

But the basic principle would be the same: you'd need a sorted character array of words in your dictionary and add to the trie little by little (use a builder).

A trie node would have three elements:

  • a map where the keys are the set of characters which can be matched next, and the values are the matching trie nodes;
  • a set of words which can match at that node exactly.

You can base your trie implementation off this one if you want.

Go for TRIE implementation.

TRIE provides the fastest way for searching over an Array of large collection of words.

https://en.wikipedia.org/wiki/Trie

What you need to do is to insert all words into the trie data structure.

Then just need to call search function in Trie to get the boolean match info.

There are two ways to do it. The best way depends on the relative size of the data structures.

If the dictionary is long and the list of letters is short, it may be best to sort the dictionary (if it is not already), then construct all possible words by permuting the letters (removing duplicates). Then do a binary search using string comparison for each combination of letters to see if it is a word in the dictionary. The tricky part is ensuring that duplicate letters are used only when appropriate.

If the list of letters is long and the dictionary is short, another way would be simply to count the number of letters in the input string: two a's, one s, one m, etc. Then for each dictionary word, if the number of each individual letter in the dictionary word does not exceed those in the input string, the word is valid.

Either way, add all words found to the output array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM