简体   繁体   中英

equalsIgnoreCase doesn't conform to javadoc?

The javadoc for String.equalsIgnoreCase says:

Two strings are considered equal ignoring case if they are of the same length and corresponding characters in the two strings are equal ignoring case. Two characters c1 and c2 are considered the same ignoring case if at least one of the following is true:

The two characters are the same (as compared by the == operator)

Applying the method Character.toUpperCase(char) to each character produces the same result

Applying the method Character.toLowerCase(char) to each character produces the same result

So can anyone explain this?

public class Test
{
    private static void testChars(char ch1, char ch2) {
        boolean b1 = (ch1 == ch2 ||
                   Character.toLowerCase(ch1) == Character.toLowerCase(ch2) ||
                   Character.toUpperCase(ch1) == Character.toUpperCase(ch2));
        System.out.println("Characters match: " + b1);

        String s1 = Character.toString(ch1);
        String s2 = Character.toString(ch2);
        boolean b2 = s1.equalsIgnoreCase(s2);
        System.out.println("equalsIgnoreCase returns: " + b2);
    }

    public static void main(String args[]) {
        testChars((char)0x0130, (char)0x0131);
        testChars((char)0x03d1, (char)0x03f4);
    }
}

Output:

Characters match: false
equalsIgnoreCase returns: true
Characters match: false
equalsIgnoreCase returns: true

Those characters' defintion of upper and lower case is probably locale-specific. From the JavaDoc for Character.toLowerCase() :

In general, String.toLowerCase() should be used to map characters to lowercase. String case mapping methods have several benefits over Character case mapping methods. String case mapping methods can perform locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, whereas the Character case mapping methods cannot.

If you look at the String.toLowerCase() method you will find it overridden to accept a Locale object as well. This will perform a locale-specific case conversion.

Edit: I would like to be clear that yes, the JavaDoc for String.equalsIgnoreCase() says what it says, but it is wrong. It cannot be correct in all cases, certainly not for characters that have surrogates, for example, but also for characters where the locale defines upper/lower case.

I found this in String.java (this snippet is also in a document that peter.petrov linked to):

        if (ignoreCase) {
            // If characters don't match but case may be ignored,
            // try converting both characters to uppercase.
            // If the results match, then the comparison scan should
            // continue.
            char u1 = Character.toUpperCase(c1);
            char u2 = Character.toUpperCase(c2);
            if (u1 == u2) {
                continue;
            }
            // Unfortunately, conversion to uppercase does not work properly
            // for the Georgian alphabet, which has strange rules about case
            // conversion.  So we need to make one last check before
            // exiting.
            if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
                continue;
            }
        }

It's used by equalsIgnoreCase . What's interesting is that if it followed what the javadoc said, the line toward the bottom should be

            if (Character.toLowerCase(c1) == Character.toLowerCase(c2)) {

using c1 and c2 instead of u1 and u2 . This affects the result for these two cases. We can all agree that the javadoc is "wrong" in that it doesn't really reflect how case-folding should work; but the above logic doesn't deal with correct case-folding any better, and it doesn't conform to the documentation besides.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM