简体   繁体   中英

Split string to equal length substrings in Java

How to split the string "Thequickbrownfoxjumps" to substrings of equal size in Java. Eg. "Thequickbrownfoxjumps" of 4 equal size should give the output.

["Theq","uick","brow","nfox","jump","s"]

Similar Question:

Split string into equal-length substrings in Scala

Here's the regex one-liner version:

System.out.println(Arrays.toString(
    "Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\\G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \\A . The enclosing lookbehind matches the position that's four characters along from the end of the last match.

Both lookbehind and \\G are advanced regex features, not supported by all flavors. Furthermore, \\G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java , Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \\G , and couldn't be used this way even if JS did support lookbehind.

I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

Also, this doesn't work in Android, which doesn't support the use of \\G in lookbehinds.

Well, it's fairly easy to do this with simple arithmetic and string operations:

public static List<String> splitEqually(String text, int size) {
    // Give the list the right capacity to start with. You could use an array
    // instead if you wanted.
    List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);

    for (int start = 0; start < text.length(); start += size) {
        ret.add(text.substring(start, Math.min(text.length(), start + size)));
    }
    return ret;
}

Note: this assumes a 1:1 mapping of UTF-16 code unit ( char , effectively) with "character". That assumption breaks down for characters outside the Basic Multilingual Plane, such as emoji, and (depending on how you want to count things) combining characters.

I don't think it's really worth using a regex for this.

EDIT: My reasoning for not using a regex:

  • This doesn't use any of the real pattern matching of regexes. It's just counting.
  • I suspect the above will be more efficient, although in most cases it won't matter
  • If you need to use variable sizes in different places, you've either got repetition or a helper function to build the regex itself based on a parameter - ick.
  • The regex provided in another answer firstly didn't compile (invalid escaping), and then didn't work. My code worked first time. That's more a testament to the usability of regexes vs plain code, IMO.

This is very easy with Google Guava :

for(final String token :
    Splitter
        .fixedLength(4)
        .split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

Output:

Theq
uick
brow
nfox
jump
s

Or if you need the result as an array, you can use this code:

String[] tokens =
    Iterables.toArray(
        Splitter
            .fixedLength(4)
            .split("Thequickbrownfoxjumps"),
        String.class
    );

Reference:

Note: Splitter construction is shown inline above, but since Splitters are immutable and reusable, it's a good practice to store them in constants:

private static final Splitter FOUR_LETTERS = Splitter.fixedLength(4);

// more code

for(final String token : FOUR_LETTERS.split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

If you're using Google's guava general-purpose libraries (and quite honestly, any new Java project probably should be), this is insanely trivial with the Splitter class:

for (String substring : Splitter.fixedLength(4).split(inputString)) {
    doSomethingWith(substring);
}

and that's it . Easy as!

public static String[] split(String src, int len) {
    String[] result = new String[(int)Math.ceil((double)src.length()/(double)len)];
    for (int i=0; i<result.length; i++)
        result[i] = src.substring(i*len, Math.min(src.length(), (i+1)*len));
    return result;
}
public String[] splitInParts(String s, int partLength)
{
    int len = s.length();

    // Number of parts
    int nparts = (len + partLength - 1) / partLength;
    String parts[] = new String[nparts];

    // Break into parts
    int offset= 0;
    int i = 0;
    while (i < nparts)
    {
        parts[i] = s.substring(offset, Math.min(offset + partLength, len));
        offset += partLength;
        i++;
    }

    return parts;
}

Here's a one-liner version which uses Java 8 IntStream to determine the indexes of the slice beginnings:

String x = "Thequickbrownfoxjumps";

String[] result = IntStream
                    .iterate(0, i -> i + 4)
                    .limit((int) Math.ceil(x.length() / 4.0))
                    .mapToObj(i ->
                        x.substring(i, Math.min(i + 4, x.length())
                    )
                    .toArray(String[]::new);

A StringBuilder version:

public static List<String> getChunks(String s, int chunkSize)
{
 List<String> chunks = new ArrayList<>();
 StringBuilder sb = new StringBuilder(s);

while(!(sb.length() ==0)) 
{           
   chunks.add(sb.substring(0, chunkSize));
   sb.delete(0, chunkSize);

}
return chunks;

}

I'd rather this simple solution:

String content = "Thequickbrownfoxjumps";
while(content.length() > 4) {
    System.out.println(content.substring(0, 4));
    content = content.substring(4);
}
System.out.println(content);

You can use substring from String.class (handling exceptions) or from Apache lang commons (it handles exceptions for you)

static String   substring(String str, int start, int end) 

Put it inside a loop and you are good to go.

i use the following java 8 solution:

public static List<String> splitString(final String string, final int chunkSize) {
  final int numberOfChunks = (string.length() + chunkSize - 1) / chunkSize;
  return IntStream.range(0, numberOfChunks)
                  .mapToObj(index -> string.substring(index * chunkSize, Math.min((index + 1) * chunkSize, string.length())))
                  .collect(toList());
}

In case you want to split the string equally backwards, ie from right to left, for example, to split 1010001111 to [10, 1000, 1111] , here's the code:

/**
 * @param s         the string to be split
 * @param subLen    length of the equal-length substrings.
 * @param backwards true if the splitting is from right to left, false otherwise
 * @return an array of equal-length substrings
 * @throws ArithmeticException: / by zero when subLen == 0
 */
public static String[] split(String s, int subLen, boolean backwards) {
    assert s != null;
    int groups = s.length() % subLen == 0 ? s.length() / subLen : s.length() / subLen + 1;
    String[] strs = new String[groups];
    if (backwards) {
        for (int i = 0; i < groups; i++) {
            int beginIndex = s.length() - subLen * (i + 1);
            int endIndex = beginIndex + subLen;
            if (beginIndex < 0)
                beginIndex = 0;
            strs[groups - i - 1] = s.substring(beginIndex, endIndex);
        }
    } else {
        for (int i = 0; i < groups; i++) {
            int beginIndex = subLen * i;
            int endIndex = beginIndex + subLen;
            if (endIndex > s.length())
                endIndex = s.length();
            strs[i] = s.substring(beginIndex, endIndex);
        }
    }
    return strs;
}

Here is a one liner implementation using Java8 streams:

String input = "Thequickbrownfoxjumps";
final AtomicInteger atomicInteger = new AtomicInteger(0);
Collection<String> result = input.chars()
                                    .mapToObj(c -> String.valueOf((char)c) )
                                    .collect(Collectors.groupingBy(c -> atomicInteger.getAndIncrement() / 4
                                                                ,Collectors.joining()))
                                    .values();

It gives the following output:

[Theq, uick, brow, nfox, jump, s]

Java 8 solution (like this but a bit simpler):

public static List<String> partition(String string, int partSize) {
  List<String> parts = IntStream.range(0, string.length() / partSize)
    .mapToObj(i -> string.substring(i * partSize, (i + 1) * partSize))
    .collect(toList());
  if ((string.length() % partSize) != 0)
    parts.add(string.substring(string.length() / partSize * partSize));
  return parts;
}

Here is my version based on RegEx and Java 8 streams. It's worth to mention that Matcher.results() method is available since Java 9.

Test included.

public static List<String> splitString(String input, int splitSize) {
    Matcher matcher = Pattern.compile("(?:(.{" + splitSize + "}))+?").matcher(input);
    return matcher.results().map(MatchResult::group).collect(Collectors.toList());
}

@Test
public void shouldSplitStringToEqualLengthParts() {
    String anyValidString = "Split me equally!";
    String[] expectedTokens2 = {"Sp", "li", "t ", "me", " e", "qu", "al", "ly"};
    String[] expectedTokens3 = {"Spl", "it ", "me ", "equ", "all"};

    Assert.assertArrayEquals(expectedTokens2, splitString(anyValidString, 2).toArray());
    Assert.assertArrayEquals(expectedTokens3, splitString(anyValidString, 3).toArray());
}

The simplest solution is:

  /**
   * Slices string by passed - in slice length.
   * If passed - in string is null or slice length less then 0 throws IllegalArgumentException.
   * @param toSlice string to slice
   * @param sliceLength slice length
   * @return List of slices
   */
  public static List<String> stringSlicer(String toSlice, int sliceLength) {
    if (toSlice == null) {
      throw new IllegalArgumentException("Passed - in string is null");
    }
    if (sliceLength < 0) {
      throw new IllegalArgumentException("Slice length can not be less then 0");
    }
    if (toSlice.isEmpty() || toSlice.length() <= sliceLength) {
      return List.of(toSlice);
    }
    
   return Arrays.stream(toSlice.split(String.format("(?s)(?<=\\G.{%d})", sliceLength))).collect(Collectors.toList());
  }
    import static java.lang.System.exit;
   import java.util.Scanner;
   import Java.util.Arrays.*;


 public class string123 {

public static void main(String[] args) {


  Scanner sc=new Scanner(System.in);
    System.out.println("Enter String");
    String r=sc.nextLine();
    String[] s=new String[10];
    int len=r.length();
       System.out.println("Enter length Of Sub-string");
    int l=sc.nextInt();
    int last;
    int f=0;
    for(int i=0;;i++){
        last=(f+l);
            if((last)>=len) last=len;
        s[i]=r.substring(f,last);
     // System.out.println(s[i]);

      if (last==len)break;
       f=(f+l);
    } 
    System.out.print(Arrays.tostring(s));
    }}

Result

 Enter String
 Thequickbrownfoxjumps
 Enter length Of Sub-string
 4

 ["Theq","uick","brow","nfox","jump","s"]

I asked @Alan Moore in a comment to the accepted solution<\/a> how strings with newlines could be handled. He suggested using DOTALL.

public void regexDotAllExample() throws UnsupportedEncodingException {
    final String input = "The\nquick\nbrown\r\nfox\rjumps";
    final String regex = "(?<=\\G.{4})";

    Pattern splitByLengthPattern;
    String[] split;

    splitByLengthPattern = Pattern.compile(regex);
    split = splitByLengthPattern.split(input);
    System.out.println("---- Without DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is a single entry longer than the desired split size:
    ---- Without DOTALL ----
    [Idx: 0, length: 26] - [B@17cdc4a5
     */


    //DOTALL suggested in Alan Moores comment on SO: https://stackoverflow.com/a/3761521/1237974
    splitByLengthPattern = Pattern.compile(regex, Pattern.DOTALL);
    split = splitByLengthPattern.split(input);
    System.out.println("---- With DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is as desired 7 entries with each entry having a max length of 4:
    ---- With DOTALL ----
    [Idx: 0, length: 4] - [B@77b22abc
    [Idx: 1, length: 4] - [B@5213da08
    [Idx: 2, length: 4] - [B@154f6d51
    [Idx: 3, length: 4] - [B@1191ebc5
    [Idx: 4, length: 4] - [B@30ddb86
    [Idx: 5, length: 4] - [B@2c73bfb
    [Idx: 6, length: 2] - [B@6632dd29
     */

}

Another brute force solution could be,

    String input = "thequickbrownfoxjumps";
    int n = input.length()/4;
    String[] num = new String[n];

    for(int i = 0, x=0, y=4; i<n; i++){
    num[i]  = input.substring(x,y);
    x += 4;
    y += 4;
    System.out.println(num[i]);
    }

Where the code just steps through the string with substrings

@Test
public void regexSplit() {
    String source = "Thequickbrownfoxjumps";
    // define matcher, any char, min length 1, max length 4
    Matcher matcher = Pattern.compile(".{1,4}").matcher(source);
    List<String> result = new ArrayList<>();
    while (matcher.find()) {
        result.add(source.substring(matcher.start(), matcher.end()));
    }
    String[] expected = {"Theq", "uick", "brow", "nfox", "jump", "s"};
    assertArrayEquals(result.toArray(), expected);
}
public static String[] split(String input, int length) throws IllegalArgumentException {

    if(length == 0 || input == null)
        return new String[0];

    int lengthD = length * 2;

    int size = input.length();
    if(size == 0)
        return new String[0];

    int rep = (int) Math.ceil(size * 1d / length);

    ByteArrayInputStream stream = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_16LE));

    String[] out = new String[rep];
    byte[]  buf = new byte[lengthD];

    int d = 0;
    for (int i = 0; i < rep; i++) {

        try {
            d = stream.read(buf);
        } catch (IOException e) {
            e.printStackTrace();
        }

        if(d != lengthD)
        {
            out[i] = new String(buf,0,d, StandardCharsets.UTF_16LE);
            continue;
        }

        out[i] = new String(buf, StandardCharsets.UTF_16LE);
    }
    return out;
}
public static List<String> getSplittedString(String stringtoSplit,
            int length) {

        List<String> returnStringList = new ArrayList<String>(
                (stringtoSplit.length() + length - 1) / length);

        for (int start = 0; start < stringtoSplit.length(); start += length) {
            returnStringList.add(stringtoSplit.substring(start,
                    Math.min(stringtoSplit.length(), start + length)));
        }

        return returnStringList;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM