简体   繁体   中英

sorting collections with string elements in java

I have a collections...i wrote code to sort values using my own comparator my comparator code is

private static class MatchComparator implements Comparator<xmlparse> {
    @Override
    public int compare(xmlparse object1, xmlparse object2) {
        String match1 = object1.getMatchId();
        String match2 = object2.getMatchId();
        return match1.compareTo(match2);
    }
}

I will call Collections.sort(list,new MatchComparator());

Everything is fine but my problem is the sorted list is wrong when i print it...

Input for list

Match19
Match7
Match12
Match46
Match32

output from the sorted list

Match12
Match19
Match32
Match46
Match7

my expected output is

Match7
Match12
Match19
Match32
Match46

要获得所需的订单,您可以将1位数字加前缀为零(例如Match07),或者您必须将字符串拆分为前缀和数字部分,并将排序实现为数字比较

The problem is that String.compareTo(..) compares the words char by char.

If all string start with Match , then you can easily fix this with:

public int compare(xmlparse object1, xmlparse object2) {
    String match1 = object1.getMatchId();
    String match2 = object2.getMatchId();
    return Integer.parseInt(match1.replace("Match"))
         - Integer.parseInt(match2.replace("Match"));
}

In case they don't start all with Match , then you can use regex:

Integer.parseInt(object1.replaceAll("[a-zA-Z]+", ""));

Or

Integer.parseInt(object1.replaceAll("[\p{Alpha}\p{Punch}]+", ""));

And a final note - name your classes with uppercase, camelCase - ie XmlParse instead of xmlparse - that's what the convention dictates.

The comparison is lexicographic and not numerical, that's your problem. In lexicoraphic ordering, 10 comes before 9.

See this question for open source implementation solutions . You can also implement your own string comparison, which shouldn't be that hard.

Implement a function and use it for comparison:
instead of

return match1.compareTo(match2);

use

return compareNatural(match1,match2);

Here is a function which does a natural comparison on strings:

private static int compareNatural(String s, String t, boolean caseSensitive) {
    int sIndex = 0;
    int tIndex = 0;

    int sLength = s.length();
    int tLength = t.length();

    while (true) {
        // both character indices are after a subword (or at zero)

        // Check if one string is at end
        if (sIndex == sLength && tIndex == tLength) {
            return 0;
        }
        if (sIndex == sLength) {
            return -1;
        }
        if (tIndex == tLength) {
            return 1;
        }

        // Compare sub word
        char sChar = s.charAt(sIndex);
        char tChar = t.charAt(tIndex);

        boolean sCharIsDigit = Character.isDigit(sChar);
        boolean tCharIsDigit = Character.isDigit(tChar);

        if (sCharIsDigit && tCharIsDigit) {
            // Compare numbers

            // skip leading 0s
            int sLeadingZeroCount = 0;
            while (sChar == '0') {
                ++sLeadingZeroCount;
                ++sIndex;
                if (sIndex == sLength) {
                    break;
                }
                sChar = s.charAt(sIndex);
            }
            int tLeadingZeroCount = 0;
            while (tChar == '0') {
                ++tLeadingZeroCount;
                ++tIndex;
                if (tIndex == tLength) {
                    break;
                }
                tChar = t.charAt(tIndex);
            }
            boolean sAllZero = sIndex == sLength || !Character.isDigit(sChar);
            boolean tAllZero = tIndex == tLength || !Character.isDigit(tChar);
            if (sAllZero && tAllZero) {
                continue;
            }
            if (sAllZero && !tAllZero) {
                return -1;
            }
            if (tAllZero) {
                return 1;
            }

            int diff = 0;
            do {
                if (diff == 0) {
                    diff = sChar - tChar;
                }
                ++sIndex;
                ++tIndex;
                if (sIndex == sLength && tIndex == tLength) {
                    return diff != 0 ? diff : sLeadingZeroCount - tLeadingZeroCount;
                }
                if (sIndex == sLength) {
                    if (diff == 0) {
                        return -1;
                    }
                    return Character.isDigit(t.charAt(tIndex)) ? -1 : diff;
                }
                if (tIndex == tLength) {
                    if (diff == 0) {
                        return 1;
                    }
                    return Character.isDigit(s.charAt(sIndex)) ? 1 : diff;
                }
                sChar = s.charAt(sIndex);
                tChar = t.charAt(tIndex);
                sCharIsDigit = Character.isDigit(sChar);
                tCharIsDigit = Character.isDigit(tChar);
                if (!sCharIsDigit && !tCharIsDigit) {
                    // both number sub words have the same length
                    if (diff != 0) {
                        return diff;
                    }
                    break;
                }
                if (!sCharIsDigit) {
                    return -1;
                }
                if (!tCharIsDigit) {
                    return 1;
                }
            } while (true);
        } else {
            // Compare words
            // No collator specified. All characters should be ascii only. Compare character-by-character.
            do {
                if (sChar != tChar) {
                    if (caseSensitive) {
                        return sChar - tChar;
                    }
                    sChar = Character.toUpperCase(sChar);
                    tChar = Character.toUpperCase(tChar);
                    if (sChar != tChar) {
                        sChar = Character.toLowerCase(sChar);
                        tChar = Character.toLowerCase(tChar);
                        if (sChar != tChar) {
                            return sChar - tChar;
                        }
                    }
                }
                ++sIndex;
                ++tIndex;
                if (sIndex == sLength && tIndex == tLength) {
                    return 0;
                }
                if (sIndex == sLength) {
                    return -1;
                }
                if (tIndex == tLength) {
                    return 1;
                }
                sChar = s.charAt(sIndex);
                tChar = t.charAt(tIndex);
                sCharIsDigit = Character.isDigit(sChar);
                tCharIsDigit = Character.isDigit(tChar);
            } while (!sCharIsDigit && !tCharIsDigit);
        }
    }
}

a better one is here

It seems that you expect not what the String.compareTo() really is. It performs so called lexicographical comarsion, but you try to compare it by number. You need to modify the code of your comparator.

  @Override
        public int compare(xmlparse object1, xmlparse object2) {
            String match1 = object1.getMatchId();
            String match2 = object2.getMatchId();

            Long n1 = getNumber(match1);
            Long n2 = getNumber(match2);

            return n1.compareTo(n2);             
        }

where getNumber() extracts the last nuber from string "matchXX"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM