I want to build index for my program and one of the most important step is to normalize text. eg I need to convert "[(Mac Pro @apple)]" to "macproapple", in which I filter blank space, punctuations([()]) and special chars(@). My code is like this:
StringBuilder sb = new StringBuilder(text);
sb = filterPunctuations(sb);
sb = filterSpecialChars(sb);
sb = filterBlankSpace(sb);
sb = toLower(sb);
Because this will generate a lot of String objects, I decide to use StringBuilder. But I don't know how to do it with StringBuffer. Does any one has some suggestions? I also need to handle chinese characters.
You can use replaceAll
api with a regular expression
String originalText = "[(Mac Pro @apple)]";
String removedString = originalText.replaceAll("[^\\p{L}\\p{N}]", "").toLowerCase();
Internally replaceAll
method uses StringBuffer so you need not worry on multiple objects created in memory.
Here is code for replaceAll
in Matcher
class
public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
Try this-
class Solution
{
public static void main (String[] args)
{
String s = "[(Mac Pro @apple)]";
s = s.replaceAll("[^A-Za-z]", "");
System.out.println(s);
}
}
This gives the output of
MacProapple
A small explanation for above lines is-
s.replaceAll("[^A-Za-z]", "")
removes everything in the string that is not(denoted by ^) in AZ and az. Regex in Java is explained here .
If you want to convert the string to lowercase at the end, you need to use s.toLowerCase()
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.