简体   繁体   中英

Java: Split string by number of characters but with guarantee that string will be split only after whitespace

I want to achieve something like this.

String str = "This is just a sample string";

List<String> strChunks = splitString(str,8);

and strChunks should should be like:

"This is ","just a ","sample ","string."

Please note that string like "sample " have only 7 characters as with 8 characters it will be "sample s" which will break down my next word "string".

Also we can go with the assumption that a word will never be larger than second argument of method (which is 8 in example) because in my use case second argument is always static with value 32000.

The obvious approach that I can think of is looping thru the given string, breaking the string after 8 chars and than searching the next white space from the end. And then repeating same thing again for remaining string.

Is there any more elegant way to achieve the same. Is there any utility method already available in some standard third libraries like Guava, Apache Commons.

Splitting on "(?<=\\\\G.{7,}\\\\s)" produces the result that you need ( demo ).

\\\\G means the end of previous match; .{7,} means seven or more of any characters; \\\\s means a space character.

Not a standard method, but this might suit your needs

See it on http://ideone.com/2RFIZd

public static List<String> splitString(String str, int chunksize) {
    char[] chars = str.toCharArray();
    ArrayList<String> list = new ArrayList<String>();
    StringBuilder builder = new StringBuilder();
    int count = 0;
    for(char character : chars) {
        if(count < chunksize - 1) {
            builder.append(character);
            count++;
        }
        else {
            if(character == ' ') {
                builder.append(character);
                list.add(builder.toString());
                count = 0;
                builder.setLength(0);
            }
            else {
                builder.append(character);
                count++;
            }
        }
    }
    list.add(builder.toString());
    builder.setLength(0);
    return list;
}

Please note, I used the human notation for string length, because that's what your sample reflects( 8 = postion 7 in string). that's why the chunksize - 1 is there.

This method takes 3 milliseconds on a text the size of http://catdir.loc.gov/catdir/enhancements/fy0711/2006051179-s.html

  • Splitting String using method 1.

     String text="This is just a sample string"; List<String> strings = new ArrayList<String>(); int index = 0; while (index < text.length()) { strings.add(text.substring(index, Math.min(index + 8,text.length()))); index += 8; } for(String s : strings){ System.out.println("["+s+"]"); } 
  • Splitting String using Method 2

     String[] s=text.split("(?<=\\\\G.{"+8+"})"); for (int i = 0; i < s.length; i++) { System.out.println("["+s[i]+"]"); } 

This uses a hacked reduction to get it done without much code:

String str = "This is just a sample string";
List<String> parts = new ArrayList<>();
parts.add(Arrays.stream(str.split("(?<= )"))
  .reduce((a, b) -> { 
    if (a.length() + b.length() <= 8)
        return a + b;
    parts.add(a); 
    return b;
  }).get());

See demo using edge case input (that breaks some other answers!)

This splits after each space, then either joins up parts or adds to the list depending on the length of the pair.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM