简体   繁体   中英

Datastrutcture for finding shortest path between two strings

I am creating a program that will take a wordlist of 5 000 strings and find the shortest path from one string to another. For example abc -> bac could print "abc, bbc, bac".

I am pretty sure about what I want to do, the only thing I'm not completely sure about is what datastructure should represent my wordlist. The goal is for the search(BFS) to run as fast as possible, so to sacrifice some space is no problem. I am thinking either a BST or an adjacency list, but since I'm no expert at datastrutcutres' timecomplexity I want to be certain before I start adjusting my code. Can anyone recommend one of the structures over the other? Or have I perhaps missed a datastructure that is an obvious alternative for this?

Looks like what you are looking for is the Levenshtein distance , here is the Rosetta code implementation , you should be able to change it to suit your need:

public class Levenshtein {

    public static int distance(String a, String b) {
        a = a.toLowerCase();
        b = b.toLowerCase();
        // i == 0
        int [] costs = new int [b.length() + 1];
        for (int j = 0; j < costs.length; j++)
            costs[j] = j;
        for (int i = 1; i <= a.length(); i++) {
            // j == 0; nw = lev(i - 1, j)
            costs[0] = i;
            int nw = i - 1;
            for (int j = 1; j <= b.length(); j++) {
                int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
                nw = costs[j];
                costs[j] = cj;
            }
        }
        return costs[b.length()];
    }

    public static void main(String [] args) {
        String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
        for (int i = 0; i < data.length; i += 2)
            System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM