简体   繁体   中英

Which data structure to use in Java for the following scenario?

I have a set of Elements which are prefixes. I've to write a Java method in such a way that whenever it gets an input verify in which prefix it belongs to and return true or false if it matches with any prefixes. Elements are

1900
1901
1902
1903
17082

Input will be

1900123445
1901334455
1800777777
...

Which Data Structure I can use so that it'll not hit the performance. Because the input numbers will be up to 50million at once.

Can anyone help me in this? Thanks in advance.

If possible, test each of these methods and see which is the fastest in context. It will probably depend on a variety of factors, and I can't really predict which one will work best in your specific use case.

Option 0 (use this if you have very few prefixes)

The easiest option: store your prefixes in a linked list and check against each one using input.startsWith(prefix) . Boring.

Option 1 (use this if there are non-numerical prefixes)

Let k be the minimum prefix length. Use a HashMap where the keys are the first k digits of each prefix and the items are linked lists containing the rest of each prefix .

For instance, say you have prefixes abcd , abce , and xyz . Then you would store the following:

  • "abc" --> ("d","e") , where ("d","e") is a linked list containing the elements "d" and "e"
  • "xyz" --> ("") (where "" is the empty string).

Call this map prefixes , and use the following code to determine if a prefix is correct:

public boolean correctPrefix(String input){
    LinkedList check = prefix.get(input.substring(0,k))
    if (check != null){
        for (String n : check){
            if (input.substring(k).startsWith(check)) return true;
        }
    return false;
}

I don't know if it will be fast enough for your purposes, though you haven't told us exactly what those are; still, I don't know of anything faster in Java.

Option 2 (use this if all prefixes are numerical, or you're using SE7)

Use a switch statement. Or rather, use multiple switch statements, one for each possible prefix length. For instance, suppose you have the prefixes 1901 , 1902 , and 20050 :

public boolean correctPrefix(String input){
    int pVal;
    pVal = Integer.parseInt(input.substring(0,4));
    switch (pval){
    case 1901: return true;
    case 1902: return true;
    }
    pVal = Integer.parseInt(input.substring(0,5));
    switch (pval){
    case 20050: return true;
    }
    return false;
}

This will be a lot more code, but I suspect it will be much faster assuming you have enough prefixes with the same length. Note that if a switch statement doesn't have very many possible cases, it won't actually be compiled as a true switch statement but as a series of if/else blocks, which will cause it to be fairly slow. You should do some mucking about with this, though, and see what you get; it might be worthwhile to throw in some bogus case [wrongprefix]: return false; statements in, because believe it or not they can actually speed things up.

Actually, as of SE7, switch statements can be used with strings. I'm not sure how efficient this is, but it's an option.

Alternatively, if you're using something prior to SE7, you could try....

Option 3 (how to cheat)

You can actually pass a radix to parseInt , meaning that if you have letters in your prefixes but they're case-insensitive, you can use Integer.parseInt(input.substring(0,4),36) to get a valid integer value, which you can then use with switch .

Your best bet would be to store the prefix in a HashMap and the value should be a linked list. I am assuming the same prefix can get tagged with various numbers like 19011234 190123444.

Are the prefixes the same length?

If so, you could hash all the prefixes.

When you encounter a new input you would extract the first x digits.

You would look for the presence of the first x digits in the hash - an O(1) operation if the hash is designed well.

If you only have (relatively) small integers, you could use a sparse array of booleans, where your prefixes will act as indexes.

Alternatively, a HashSet would be a good choice.

Create a HashSet and add these 4 values in it. and then parse your input values to get first four integers and call HashSet contains() method, that will return true/false.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM