简体   繁体   中英

Java String Fix Capitalization in Abbreviations

I need a way to fix capitalization in abbreviations found within a String . Assume all abbreviations are correctly spaced.

For example,

"Robert a.k.a. Bob A.k.A. dr. Bobby"

becomes:

"Robert A.K.A. Bob A.K.A. Dr. Bobby"

Correctly capitalized abbreviations will be known ahead of time, stored in a Collection of some sort.

I was thinking of an algorithm like this:

private String fix(String s) {
    StringBuilder builder = new StringBuilder();
    for (String word : s.split(" ")) {
        if (collection.contains(word.toUpperCase()) {
            // word = correct abbreviation here
        }
        builder.append(word);
        builder.append(" ");
    }
    return builder.toString().trim();
}

But as far as I know, there are a couple of problems with this approach:

  • If the abbreviation has a lower case letter (Dr.)
  • If the word starts or ends with punctuation ("aka")

I have a feeling that this can be solved with a regex, iteratively matching and replacing the correct abbreviation. But if not, how should I approach this problem?

Instead of using a regex or rolling your own implementation, I would suggest you use an utility library. WordUtils in Apache Commons Lang is perfect for the job:

String input = "Robert a.k.a. Bob A.k.A. dr. Bobby";
String capitalized = WordUtils.capitalize(input, '.', ' ');
System.out.println(capitalized);

This prints out

Robert A.K.A. Bob A.K.A. Dr. Bobby

You do not have to use regex, ie. your solution looks reasonable (although it may be slow if you have a lot of data to process).

For abbreviations contained lower case letters, eg. Dr. you could use a case insensitive string comparison rather than toUpperCase . Actually, that's only useful if you are directly comparing the strings yourself. You really need a case-insensitive HashMap . Perhaps:

Map<String, String> collection = new TreeMap<String, String>(String.CASE_INSENSITIVE_ORDER);

If the abbreviation starts or ends with punctuation, then make sure the corresponding key in your collection does too.

This is how I went about it...

UPDATED

after reading comments by OP

it prints:

Robert AKA Bob AKA Dr. Bobby The oo

import java.util.ArrayList;
import java.util.List;

public class Fixer {

    List<String> collection = new ArrayList<>();

    public Fixer() {
        collection.add("Dr.");
        collection.add("A.K.A.");
        collection.add("o.o.");
    }

    /* app entry point */
    public static void main(String[] args) throws InterruptedException {
        String testCase = "robert a.k.a. bob A.k.A. dr. bobby the o.o.";

        Fixer l = new Fixer();
        String result = l.fix(testCase);

        System.out.println(result);
    }

    private String fix(String s) {
        StringBuilder builder = new StringBuilder();
        for (String word : s.split(" ")) {
            String abbr = getAbbr(word);
            if (abbr == null) {
                builder.append(word.substring(0, 1).toUpperCase());
                builder.append(word.substring(1));
            } else {
                builder.append(abbr);
            }
            builder.append(" ");
        }
        return builder.toString().trim();
    }

    private String getAbbr(String word) {
        for (String abbr : collection) {
            if (abbr.equalsIgnoreCase(word)) {
                return abbr;
            }
        }
        return null;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM