如何过滤Java String以仅获取字母字符？

Question

I'm generating a XML file to make payments and I have a constraint for user's full names. 我正在生成一个XML文件来进行付款，我对用户的全名有约束。 That param only accept alphabet characters (a-ZAZ) + whitespaces to separe names and surnames. 那个参数只接受字母字符（a-ZAZ）+空格来分隔姓名和姓氏。

I'm not able to filter this in a easy way, how can I build a regular expression or a filter to get my desireable output? 我无法以简单的方式过滤这个，我如何构建正则表达式或过滤器以获得我想要的输出？

Example: 例：

'Carmen López-Delina Santos' must be 'Carmen LopezDelina Santos' 'Carmen López-Delina Santos' 'Carmen LopezDelina Santos' 'Carmen López-Delina Santos'必须是'Carmen LopezDelina Santos'

I need to transform vowels with decorations in single vowels as follows: á > a, à > a, â > a, and so on; 我需要用单个元音转换带有装饰的元音，如下所示：á> a，à> a，a，等等; and also remove special characters as dots, hyphens, etc. 并删除点，连字符等特殊字符。

Thanks! 谢谢！

Answer 1

You can first use a Normalizer and then remove the undesired characters: 您可以先使用Normalizer ，然后删除不需要的字符：

String input = "Carmen López-Delina Santos";
String withoutAccent = Normalizer.normalize(input, Normalizer.Form.NFD);
String output = withoutAccent.replaceAll("[^a-zA-Z ]", "");
System.out.println(output); //prints Carmen LopezDelina Santos

Note that this may not work for all and any non-ascii letters in any language - if such a case is encountered the letter would be deleted. 请注意，这可能不适用于任何语言的所有和任何非ascii字母 - 如果遇到这种情况，该字母将被删除。 One such example is the Turkish i . 一个这样的例子是土耳其语i 。

The alternative in that situation is probably to list all the possible letters and their replacement... 在这种情况下的替代方案可能是列出所有可能的字母及其替代品......

Answer 2

You can use this removeAccents method with a later replaceAll with [^A-Za-z ] : 您可以将此removeAccents方法与稍后的replaceAll与[^A-Za-z ] ：

public static String removeAccents(String text) {
  return text == null ? null :
    Normalizer.normalize(text, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

The Normalizer decomposes the original characters into a combination of a base character and a diacritic sign (this could be multiple signs in different languages). Normalizer将原始字符分解为基本字符和变音符号的组合（这可以是不同语言中的多个符号）。 á , é and í have the same sign: 0301 for marking the ' accent. á ， é和í具有相同的符号： 0301用于标记'重音符号。

The \\p{InCombiningDiacriticalMarks}+ regular expression will match all such diacritic codes and we will replace them with an empty string. \\p{InCombiningDiacriticalMarks}+正则表达式将匹配所有这些变音符号代码，我们将用空字符串替换它们。

And in the caller: 在来电者：

String original = "Carmen López-Delina Santos";
String res = removeAccents(original).replaceAll("[^A-Za-z ]", "");
System.out.println(res);

See IDEONE demo 请参阅IDEONE演示

如何过滤Java String以仅获取字母字符？

问题描述

2 个解决方案

解决方案1
14 已采纳 2015-06-11 11:59:35

解决方案2
1 2015-06-11 12:08:14

如何过滤Java String以仅获取字母字符？

问题描述

2 个解决方案

解决方案1 14 已采纳 2015-06-11 11:59:35

解决方案2 1 2015-06-11 12:08:14

解决方案1
14 已采纳 2015-06-11 11:59:35

解决方案2
1 2015-06-11 12:08:14