[英]How to normalize/polish a text in Java?
What method would you suggest to normalizing a text in Java, for example 例如,您建议采用哪种方法来规范化Java中的文本
String raw = " This is\n a test\n\r ";
String txt = normalize(raw);
assert txt == "This is a test";
I'm thinking about StringUtils
.replace()
and .strip()
methods, but maybe there is some easier way. 我正在考虑StringUtils
.replace()
和.strip()
方法,但也许有一些更简单的方法。
如果只是空格,请尝试以下操作
String txt = raw.replaceAll("\\s+", " ").trim();
I see that you have a newline actually in the string that you want to get rid of. 我看到您要删除的字符串中实际上有一个换行符。 In which case I would recommend using a regex like so... 在这种情况下,我建议像这样使用正则表达式...
Pattern.compile("\\s+").matcher(text).replaceAll(" ").trim();
You can alway store the compiled regex for better performance. 您可以始终存储已编译的正则表达式以获得更好的性能。
depends a little on exactly what it is you want to strip. 完全取决于您要剥离的东西。 If its certain specific characters then replaceAll() would be the go as posted by @Yaneeve. 如果它的某些特定字符,则@Yaneeve将发布replaceAll()。 If the needs are more general then you might want to look at normalize the string using the Normalizer . 如果需要更一般,则您可能需要使用Normalizer来对字符串进行标准化 。
Apache Commons最终添加了此功能: org.apache.commons.lang3.StringUtils.normalizeSpace(String str)
// docs
To remove the first and the last spaces you're looking for String#trim() 要删除第一个和最后一个空格,您需要寻找String#trim()
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#trim () http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#trim ()
If normalization means replacing sequences of spaces, tabs, newlines, and linefeeds, then I'd consider using a simple regular expression and String.split() to create separate words, then appending them in a StringBuilder with the spacing you'd like in between. 如果规范化意味着要替换空格,制表符,换行符和换行符的序列,那么我考虑使用一个简单的正则表达式和String.split()来创建单独的单词,然后将它们以所需的间距附加到StringBuilder中之间。 If performance really matters, another approach would be to simply loop over the String's characters, looking at each one and deciding whether to append it to a StringBuilder or to discard it. 如果性能确实很重要,另一种方法是简单地遍历String的字符,查看每个字符并决定是将其附加到StringBuilder还是将其丢弃。
private static String normalize(String raw) {
StringBuilder sb = new StringBuilder();
Scanner scanner = new Scanner(raw);
while (scanner.hasNext()) {
sb.append(scanner.next());
sb.append(' ');
}
sb.deleteCharAt(sb.length() - 1);
return sb.toString();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.