简体   繁体   中英

split a string based on pattern in java - capital letters and numbers

I have the following string "3/4Ton". I want to split it as -->

word[1] = 3/4 and word[2] = Ton.

Right now my piece of code looks like this:-

Pattern p = Pattern.compile("[A-Z]{1}[a-z]+");
Matcher m = p.matcher(line);
while(m.find()){
    System.out.println("The word --> "+m.group());
    }

It carries out the needed task of splitting the string based on capital letters like:-

String = MachineryInput

word[1] = Machinery , word[2] = Input

The only problem is it does not preserve, numbers or abbreviations or sequences of capital letters which are not meant to be separate words. Could some one help me out with my regular expression coding problem.

Thanks in advance...

You can actually do this in regex alone using look ahead and look behind (see special constructs on this page: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html )

/**
 * We'll use this pattern as divider to split the string into an array.
 * Usage: myString.split(DIVIDER_PATTERN);
 */
private static final String DIVIDER_PATTERN =

        "(?<=[^\\p{Lu}])(?=\\p{Lu})"
                // either there is anything that is not an uppercase character
                // followed by an uppercase character

                + "|(?<=[\\p{Ll}])(?=\\d)"
        // or there is a lowercase character followed by a digit

        ;

@Test
public void testStringSplitting() {
    assertEquals(2, "3/4Word".split(DIVIDER_PATTERN).length);
    assertEquals(7, "ManyManyWordsInThisBigThing".split(DIVIDER_PATTERN).length);
    assertEquals(7, "This123/4Mixed567ThingIsDifficult"
                        .split(DIVIDER_PATTERN).length);
}

So what you can do is something like this:

for(String word: myString.split(DIVIDER_PATTERN)){
    System.out.println(word);
}

Sean

Using regex would be nice here. I bet there's a way to do it too, although I'm not a swing-in-on-a-vine regex guy so I can't help you. However, there's something you can't avoid - something, somewhere needs to loop over your String eventually. You could do this "on your own" like so:

String[] splitOnCapitals(String str) {
    ArrayList<String> array = new ArrayList<String>();
    StringBuilder builder = new StringBuilder();
    int min = 0;
    int max = 0;
    for(int i = 0; i < str.length(); i++) {
        if(Character.isUpperCase(str.charAt(i))) {
            String line = builder.toString().trim();
            if (line.length() > 0) array.add(line);
            builder = new StringBuilder();
        }
        builder.append(str.charAt(i));
    }
    array.add(builder.toString().trim()); // get the last little bit too
    return array.toArray(new String[0]);
}

I tested it with the following test driver:

public static void main(String[] args) {
    String test = "3/4 Ton truCk";
    String[] arr = splitOnCapitals(test);
    for(String s : arr) System.out.println(s);

    test = "Start with Capital";
    arr = splitOnCapitals(test);
    for(String s : arr) System.out.println(s);
}

And got the following output:

3/4
Ton tru
Ck
Start with
Capital

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM