I need a way to fix capitalization in abbreviations found within a String
. Assume all abbreviations are correctly spaced.
For example,
"Robert a.k.a. Bob A.k.A. dr. Bobby"
becomes:
"Robert A.K.A. Bob A.K.A. Dr. Bobby"
Correctly capitalized abbreviations will be known ahead of time, stored in a Collection
of some sort.
I was thinking of an algorithm like this:
private String fix(String s) {
StringBuilder builder = new StringBuilder();
for (String word : s.split(" ")) {
if (collection.contains(word.toUpperCase()) {
// word = correct abbreviation here
}
builder.append(word);
builder.append(" ");
}
return builder.toString().trim();
}
But as far as I know, there are a couple of problems with this approach:
I have a feeling that this can be solved with a regex, iteratively matching and replacing the correct abbreviation. But if not, how should I approach this problem?
Instead of using a regex or rolling your own implementation, I would suggest you use an utility library. WordUtils
in Apache Commons Lang is perfect for the job:
String input = "Robert a.k.a. Bob A.k.A. dr. Bobby";
String capitalized = WordUtils.capitalize(input, '.', ' ');
System.out.println(capitalized);
This prints out
Robert A.K.A. Bob A.K.A. Dr. Bobby
You do not have to use regex, ie. your solution looks reasonable (although it may be slow if you have a lot of data to process).
For abbreviations contained lower case letters, eg. Dr. you could use a case insensitive string comparison rather than toUpperCase
. Actually, that's only useful if you are directly comparing the strings yourself. You really need a case-insensitive HashMap
. Perhaps:
Map<String, String> collection = new TreeMap<String, String>(String.CASE_INSENSITIVE_ORDER);
If the abbreviation starts or ends with punctuation, then make sure the corresponding key in your collection does too.
This is how I went about it...
UPDATED
after reading comments by OP
it prints:
Robert AKA Bob AKA Dr. Bobby The oo
import java.util.ArrayList;
import java.util.List;
public class Fixer {
List<String> collection = new ArrayList<>();
public Fixer() {
collection.add("Dr.");
collection.add("A.K.A.");
collection.add("o.o.");
}
/* app entry point */
public static void main(String[] args) throws InterruptedException {
String testCase = "robert a.k.a. bob A.k.A. dr. bobby the o.o.";
Fixer l = new Fixer();
String result = l.fix(testCase);
System.out.println(result);
}
private String fix(String s) {
StringBuilder builder = new StringBuilder();
for (String word : s.split(" ")) {
String abbr = getAbbr(word);
if (abbr == null) {
builder.append(word.substring(0, 1).toUpperCase());
builder.append(word.substring(1));
} else {
builder.append(abbr);
}
builder.append(" ");
}
return builder.toString().trim();
}
private String getAbbr(String word) {
for (String abbr : collection) {
if (abbr.equalsIgnoreCase(word)) {
return abbr;
}
}
return null;
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.