简体   繁体   中英

Split at every i-th and j-th char

I need to split a string at every i-th and j-th character, where i and j can change according to input parameters. If for example i have an input

String s = "1234567890abcdef";
int i = 2;
int j = 3;

I want my output to be an array of:

[12, 345, 67, 890, ab, cde, f]

I found a compact regex to split at every n-th char. Example for n = 3 using "(?<=\\G...)" or "(?<=\\G.{3})"

String s = "1234567890abcdef";
int n = 3;
System.out.println(Arrays.toString(s.split("(?<=\\G.{"+n+"})")));

//output: [123, 456, 789, 0ab, cde, f]

How to modify the above regex to split at every 2nd and 3rd char alternately?

A naive chaining like "(?<=\\G.{2})(?<=\\G.{3})" did not work.

I don't think you can do this with split() , because every match should be aware of the pattern previously matched.

If you don't want to manually iterate over the string's characters, you can use something like this:

Matcher m = Pattern.compile("(.{0,2})(.{0,3})").matcher("1234567890abcdef");
List<String> list = new ArrayList<>();
while (m.find()) {
  for (int i = 1; i <= 2; i++) {
    if (!m.group(i).isEmpty()) {
      list.add(m.group(i));
    }
  }
}
System.out.println(list);  // prints [12, 345, 67, 890, ab, cde, f]

O(n) solution by iterating over the characters:

private static List<String> splitByPattern(String str, List<Integer> pattern) {
    int currentPatternIndex = 0;
    int iterationsTillNextSplit = pattern.get(currentPatternIndex);
    StringBuilder stringBuilder = new StringBuilder();
    List<String> strs = new ArrayList<>();

    for (char c : str.toCharArray()) {
        if (iterationsTillNextSplit == 0) { // Time to split
            strs.add(stringBuilder.toString());
            stringBuilder = new StringBuilder();
            iterationsTillNextSplit = pattern.get(++currentPatternIndex % pattern.size());
        }

        stringBuilder.append(c);
        iterationsTillNextSplit--;
    }

    strs.add(stringBuilder.toString());

    return strs;
}

Usage:

System.out.println(splitByPattern("1234567890abcdef", Arrays.asList(2, 3)));

Output:

[12, 345, 67, 890, ab, cde, f]

Here is another simple solution which doesn't make use of regular expressions:

String s = "1234567890abcdef";
int strLen = s.length();
List<String> list = new ArrayList<>();
for (int lastIndex = 0; lastIndex < strLen;) {
    int numChars = list.size() % 2 == 0 ? 2 : 3; // this alternates substrings of length 2 and 3
    if (strLen - lastIndex < numChars)
        list.add(s.substring(lastIndex));
    else
        list.add(s.substring(lastIndex, lastIndex+numChars));
    lastIndex += numChars;
}
System.out.println(list);  // prints [12, 345, 67, 890, ab, cde, f]

There is a somewhat hacky way to split() using regex, but as @horcrux mentioned:

every match should be aware of the pattern previously matched

You would have to:

a) insert an anchor to make further backreferences by adding a "unlikely" character or string (eg line-break) into every i + j position first:

s = s.replaceAll("(.{5})", "$1\n");

So that your string transforms to 12345\n67890\nabcde\nf

b) Now you can split by looking around

String[] result = s.split("(?<=\\G.{2})(?=.{3}\n)|\n");

where you look for a zero-length match having i characters on the left (?<=\G.{2}) and followed by j characters ending with your "special" pattern OR just match your "special" pattern if not found.

This allows alternating split either at a position i or at the match of "special" pattern.

使用哈希 # 作为特殊模式

Complete one-liner (for educational purposes only):

System.out.println(Arrays.toString(s.replaceAll("(.{"+(i+j)+"})", "$1#").split("(?<=\\G.{"+i+"})(?=.{"+j+"}#)|#")));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM