简体   繁体   中英

Java String: split String

I have this String:

 String string="NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";

How can I do to split it into an array every 4 commas? I would like something like this:

     String[] a=string.split("d{4}");
     a[0]="NNP,PERSON,true,?";
     a[1]="IN,O,false,pobj";
     a[2]="NNP,ORGANIZATION,true,?";
     a[3]="p";

Keep it simple. No need to use regex . Simply count the number of commas. when four commas are found then use String.substring() to find out the value.

Finally store the printed values in ArrayList<String> .

    String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";

    int count = 0;
    int beginIndex = 0;
    int endIndex = 0;
    for (char ch : string.toCharArray()) {
        if (ch == ',') {
            count++;
        }
        if (count == 4) {
            System.out.println(string.substring(beginIndex + 1, endIndex));
            beginIndex = endIndex;
            count = 0;
        }
        endIndex++;
    }

    if (beginIndex < endIndex) {
        System.out.println(string.substring(beginIndex + 1, endIndex));
    }

output:

    NP,PERSON,true,?
    IN,O,false,pobj
    NNP,ORGANIZATION,true,?
    p

If you really have to use split you can use something like

String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");

Explanation if idea in my previous answer on similar but simpler topic

Demo:

String string = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] array = string.split("(?<=\\G[^,]{1,100},[^,]{1,100},[^,]{1,100},[^,]{1,100}),");
for (String s : array)
    System.out.println(s);

output:

NNP,PERSON,true,?
IN,O,false,pobj
NNP,ORGANIZATION,true,?
p

But if there is any chance that you don't have to use split but you still want to use regex then I encourage you to use Pattern and Matcher classes to create simple regex which can find parts you are interested in, not complicated regex to find parts you want to get rid of. I mean something like

  1. any xx,xxx,xxx,xxx part where x is not ,
  2. any xx or xx,xx or xxx,xxx,xxx parts if they are placed at the end of string (to catch rest of data unmatched by regex from point 1.)

So

Pattern p = Pattern.compile("[^,]+(,[^,]+){3}|[^,]+(,[^,]+){0,2}$");

should do the trick.


Another solution and probably the fastest (and quite easy to write) would be creating your own parser which will iterate over all characters from your string, store them in some buffer, calculate how many , already occurred and if number is multiplication of 4 clear buffer and write its contend to array (or better dynamic collection like list). Such parser can look like

public static List<String> parse(String s){
    List<String> tokens = new ArrayList<>();
    StringBuilder sb = new StringBuilder();
    int commaCounter = 0;

    for (char ch: s.toCharArray()){
        if (ch==',' && ++commaCounter == 4){
            tokens.add(sb.toString());
            sb.delete(0, sb.length());
            commaCounter = 0;
        }else{
            sb.append(ch);
        }
    }
    if (sb.length()>0)
        tokens.add(sb.toString());

    return tokens;
}

You can later convert List to array if you need but I would stay with List.

Edited, Try this:

String str = "NNP,PERSON,true,?,IN,O,false,pobj,NNP,ORGANIZATION,true,?,p";
String[] arr = str.split(",");
ArrayList<String> result = new ArrayList<String>();
String s = arr[0] + ",";
int len = arr.length - (arr.length /4) * 4;
int i;
for (i = 1; i <= arr.length-len; i++) {
    if (i%4 == 0) {
        result.add(s.substring(0, s.length()-1));
        s = arr[i] + ",";
    }
    else
        s += arr[i] + ",";
}
s = "";
while (i <= arr.length-1) {
    s += arr[i] + ",";
    i++;
}
s += arr[arr.length-1];
result.add(s);

output:

    NP,PERSON,true,?
    IN,O,false,pobj
    NNP,ORGANIZATION,true,?
    p
StringTokenizer tizer = new StringTokenizer (string,",");
int count = tizer.countTokens ()/4;
int overFlowCount = tizer.countTokens % 4;
String [] a;
if(overflowCount > 0)
    a = new String[count +1];
else
    a = new String[count];
int x = 0;
for (; x <count; x++){
    a[x]= tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken() + "," + tizer.nextToken();
}
if(overflowCount > 0)
while(tizer.hasMoreTokens()){
    a[x+1] = a[x+1] + tizer.nextToken() + ",";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM