从String中删除重音符号

Question

Is there any way in Android that (to my knowledge) doesn't have java.text.Normalizer, to remove any accent from a String. 在Android中是否有任何方法（据我所知）没有java.text.Normalizer，从String中删除任何重音。 Eg "éàù" becomes "eau". 例如“éàù”变成“eau”。

I'd like to avoid parsing the String to check each character if possible! 如果可能的话，我想避免解析String来检查每个字符！

Answer 1

java.text.Normalizer is there in Android (on latest versions anyway). java.text.Normalizer存在于Android中（无论如何最新版本）。 You can use it. 你可以使用它。

EDIT For reference, here is how to use Normalizer : 编辑供参考，以下是使用Normalizer ：

string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^\\p{ASCII}]", "");

(pasted from the link in comments below) （粘贴在下面评论中的链接）

Answer 2

I've ajusted Rabi's solution to my needs, I hope it helps someone: 我已经调整了Rabi对我的需求的解决方案，我希望它可以帮助某人：

private static Map<Character, Character> MAP_NORM;
public static String removeAccents(String value)
{
    if (MAP_NORM == null || MAP_NORM.size() == 0)
    {
        MAP_NORM = new HashMap<Character, Character>();
        MAP_NORM.put('À', 'A');
        MAP_NORM.put('Á', 'A');
        MAP_NORM.put('Â', 'A');
        MAP_NORM.put('Ã', 'A');
        MAP_NORM.put('Ä', 'A');
        MAP_NORM.put('È', 'E');
        MAP_NORM.put('É', 'E');
        MAP_NORM.put('Ê', 'E');
        MAP_NORM.put('Ë', 'E');
        MAP_NORM.put('Í', 'I');
        MAP_NORM.put('Ì', 'I');
        MAP_NORM.put('Î', 'I');
        MAP_NORM.put('Ï', 'I');
        MAP_NORM.put('Ù', 'U');
        MAP_NORM.put('Ú', 'U');
        MAP_NORM.put('Û', 'U');
        MAP_NORM.put('Ü', 'U');
        MAP_NORM.put('Ò', 'O');
        MAP_NORM.put('Ó', 'O');
        MAP_NORM.put('Ô', 'O');
        MAP_NORM.put('Õ', 'O');
        MAP_NORM.put('Ö', 'O');
        MAP_NORM.put('Ñ', 'N');
        MAP_NORM.put('Ç', 'C');
        MAP_NORM.put('ª', 'A');
        MAP_NORM.put('º', 'O');
        MAP_NORM.put('§', 'S');
        MAP_NORM.put('³', '3');
        MAP_NORM.put('²', '2');
        MAP_NORM.put('¹', '1');
        MAP_NORM.put('à', 'a');
        MAP_NORM.put('á', 'a');
        MAP_NORM.put('â', 'a');
        MAP_NORM.put('ã', 'a');
        MAP_NORM.put('ä', 'a');
        MAP_NORM.put('è', 'e');
        MAP_NORM.put('é', 'e');
        MAP_NORM.put('ê', 'e');
        MAP_NORM.put('ë', 'e');
        MAP_NORM.put('í', 'i');
        MAP_NORM.put('ì', 'i');
        MAP_NORM.put('î', 'i');
        MAP_NORM.put('ï', 'i');
        MAP_NORM.put('ù', 'u');
        MAP_NORM.put('ú', 'u');
        MAP_NORM.put('û', 'u');
        MAP_NORM.put('ü', 'u');
        MAP_NORM.put('ò', 'o');
        MAP_NORM.put('ó', 'o');
        MAP_NORM.put('ô', 'o');
        MAP_NORM.put('õ', 'o');
        MAP_NORM.put('ö', 'o');
        MAP_NORM.put('ñ', 'n');
        MAP_NORM.put('ç', 'c');
    }

    if (value == null) {
        return "";
    }

    StringBuilder sb = new StringBuilder(value);

    for(int i = 0; i < value.length(); i++) {
        Character c = MAP_NORM.get(sb.charAt(i));
        if(c != null) {
            sb.setCharAt(i, c.charValue());
        }
    }

    return sb.toString();

}

Answer 3

This is probably not the most efficient solution but it will do the trick and it works in all Android versions: 这可能不是最有效的解决方案，但它可以解决问题，它适用于所有Android版本：

private static Map<Character, Character> MAP_NORM;
static { // Greek characters normalization
    MAP_NORM = new HashMap<Character, Character>();
    MAP_NORM.put('ά', 'α');
    MAP_NORM.put('έ', 'ε');
    MAP_NORM.put('ί', 'ι');
    MAP_NORM.put('ό', 'ο');
    MAP_NORM.put('ύ', 'υ');
    MAP_NORM.put('ή', 'η');
    MAP_NORM.put('ς', 'σ');
    MAP_NORM.put('ώ', 'ω');
    MAP_NORM.put('Ά', 'α');
    MAP_NORM.put('Έ', 'ε');
    MAP_NORM.put('Ί', 'ι');
    MAP_NORM.put('Ό', 'ο');
    MAP_NORM.put('Ύ', 'υ');
    MAP_NORM.put('Ή', 'η');
    MAP_NORM.put('Ώ', 'ω');
}

public static String removeAccents(String s) {
    if (s == null) {
        return null;
    }
    StringBuilder sb = new StringBuilder(s);

    for(int i = 0; i < s.length(); i++) {
        Character c = MAP_NORM.get(sb.charAt(i));
        if(c != null) {
            sb.setCharAt(i, c.charValue());
        }
    }

    return sb.toString();
}

Answer 4

While Guillaume's answer does work it strips all non-ASCII characters from the string. 虽然Guillaume的答案确实有效，但它会从字符串中删除所有非ASCII字符。 If you wish to preserve these try this code (where string is the string to simplify): 如果你想保留这些，请尝试这段代码（其中string是要简化的字符串）：

// Convert input string to decomposed Unicode (NFD) so that the
// diacritical marks used in many European scripts (such as the
// "C WITH CIRCUMFLEX" → ĉ) become separate characters.
// Also use compatibility decomposition (K) so that characters,
// that have the exact same meaning as one or more other
// characters (such as "㎏" → "kg" or "ﾋ" → "ヒ"), match when
// comparing them.
string = Normalizer.normalize(string, Normalizer.Form.NFKD);

StringBuilder result = new StringBuilder();

int offset = 0, strLen = string.length();
while(offset < strLen) {
    int character = string.codePointAt(offset);
    offset += Character.charCount(character);

    // Only process characters that are not combining Unicode
    // characters. This way all the decomposed diacritical marks
    // (and some other not-that-important modifiers), that were
    // part of the original string or produced by the NFKD
    // normalizer above, disappear.
    switch(Character.getType(character)) {
        case Character.NON_SPACING_MARK:
        case Character.COMBINING_SPACING_MARK:
            // Some combining character found
        break;

        default:
            result.appendCodePoint(Character.toLowerCase(character));
    }
}

// Since we stripped all combining Unicode characters in the
// previous while-loop there should be no combining character
// remaining in the string and the composed and decomposed
// versions of the string should be equivalent. This also means
// we do not need to convert the string back to composed Unicode
// before returning it.
return result.toString();

Answer 5

All accented chartacters are in the extended ASCII character code set, with decimal values greater than 127. So you could enumerate all the characters in a string and if the decimal character code value is greater than 127, map it back to your desired equivalent. 所有带重音的图表都在扩展的ASCII字符代码集中，十进制值大于127.因此，您可以枚举字符串中的所有字符，如果十进制字符代码值大于127，则将其映射回所需的等效值。 There is no easy way to map accented characters back to the non-accented counterparts - you would have to keep some sort of map in memory to map the extended decimal codes back to the unaccented characters. 没有简单的方法将重音字符映射回非重音符号 - 您必须在内存中保留某种映射，以将扩展的十进制代码映射回非重音字符。

从String中删除重音符号

问题描述

5 个解决方案

解决方案1
84 已采纳 2011-12-15 16:54:46

解决方案2
4 2013-11-29 15:59:56

解决方案3
3 2012-03-24 07:13:16

解决方案4
2 2015-08-16 00:07:04

解决方案5
0 2011-12-15 16:51:21

从String中删除重音符号

问题描述

5 个解决方案

解决方案1 84 已采纳 2011-12-15 16:54:46

解决方案2 4 2013-11-29 15:59:56

解决方案3 3 2012-03-24 07:13:16

解决方案4 2 2015-08-16 00:07:04

解决方案5 0 2011-12-15 16:51:21

解决方案1
84 已采纳 2011-12-15 16:54:46

解决方案2
4 2013-11-29 15:59:56

解决方案3
3 2012-03-24 07:13:16

解决方案4
2 2015-08-16 00:07:04

解决方案5
0 2011-12-15 16:51:21