简体   繁体   中英

Best way to hash sets of three integers

I'm trying to write a program that quickly finds all words that match a string with wildcards and known letters. For example, L*G would return LOG, LAG, LEG. I'm looking for something that would give me very fast lookup, but I don't care about time it takes to create the tree in the first place.

My idea is a Hashmap of "Triples" mapping to an ArrayList of Strings: basically, a list of all Strings that match the criteria for a certain index, character at that index, and length.

But my issue now is generating a good hash function for these "Triples" such that every triplet is unique.

Here's my code for what I have right now.

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

public class CWSolutionT {
HashMap<Triple, ArrayList<String>> tripleMap = new HashMap<Triple, ArrayList<String>>();

public CWSolutionT(List<String> allWords) {
    for (String word : allWords) {
        for (int letter = 0; letter < word.length(); letter++) {
            ArrayList<String> tempList = new ArrayList<String>();
            Triple key = new Triple(letter, word.charAt(letter),
                    word.length());
            if (tripleMap.get(key) == null) {
                tempList.add(word);
                tripleMap.put(key, tempList);
            } else {
                tempList = tripleMap.get(key);
                tempList.add(word);
                tripleMap.put(key, tempList);
            }
        }
    }
}

public List<String> solutions(String pattern, int maxRequired) {
    List<String> sol = new ArrayList<String>();
    List<List<String>> solList = new ArrayList<List<String>>();
    int length = pattern.length();
    for (int i = 0; i < length; i++) {
        if (pattern.charAt(i) != '*') {
            Triple key = new Triple(i, pattern.charAt(i), pattern.length());
            solList.add(tripleMap.get(key));
        }
    }
    if (solList.size() == 0) {
        // implement later
    }

    if (solList.size() == 1)
        return solList.get(0);

    for (List<String> list : solList) {
        sol.retainAll(list);
    }
    return sol;
}

private class Triple {
    public final int index;
    public final char letter;
    public final int length;

    public Triple(int ind, char let, int len) {
        index = ind;
        letter = let;
        length = len;
    }

    public boolean equals(Object o) {
        if (o == null)
            return false;
        if (o == this)
            return true;
        if (!(o instanceof Triple)) {
            return false;
        }
        Triple comp = (Triple) o;
        if (this.hashCode() == comp.hashCode())
            return true;
        return false;
    }

    public String toString() {
        return "(" + index + ", " + letter + ", " + length + ")";
    }

    public int hashCode() {
        return (int) (.5 * (index + letter + length)
                * (index + letter + length + 1) + letter + length);
    }
}
}

您还必须在Tuple类中重写hashCode()

HashMap requires an implementation of equals and hashCode and you haven't overridden either of them. Right now your map is basically only checking object identify (someOldObject == someNewObject).

It looks like you've attempted to implement an equals method but the signature is wrong, Object#equals takes an Object as a parameter and yours takes a Tuple so it is instead overloaded . Answers for this question have some tips for implementing equals and hashCode: What issues should be considered when overriding equals and hashCode in Java? .

In general, it's impossible to squeze two int s into a single one. Cantor pairing function works for an infinite domain, but transferring it to int s leads nowhere.

Your input is a triple (int, char, int) , which theoretically accounts for 2**32 * 2**16 * 2**32 = 2**80 possibilities, which obviously can't be injectively mapped onto a set containing 2**32 elements only.

With the help of some knowledge about the input values you can do better, but possibly not good enough. For example, you know that both index and length are non-negative, which makes your domain four times smaller... but that's nothing.

If you knew that both index and length are less than 256, you could do the mapping via

public int hashCode() {
    return index + 256 * length + 256 * 256 * letter;
}

This expression can be written more efficiently as

index + (length << 8) + (letter << 16)

but you can leave it to the JIT compiler.

It's well possible that you can't reduce the domain sufficiently and then I'd suggest to let it be. Just compare the fields... you could try to fit it in a long, but isn't it all the root of all evil ?

Note that your

public int hashCode() {
    return (int) (.5 * (index + letter + length)
            * (index + letter + length + 1) + letter + length);
}

is too bad.... you're using a pairing of three things ! The tuples (0, 'A', 1) and (0, 'B', 0) have the same hashCode and thus your equals return true. You're actually Cantor-pairing index and letter + length while you'd have to use two pairings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM