I would like to split a text by ',' but not by ',' that are between parenthesis or chevron.
For example:
The string "test.toto, test->toto.value(), sizeof(test, toto)" should return this list '[test.toto, test->toto.value(), sizeof(test, toto)]'
The string "test.toto, test.value(), toto" should return this list '[test.toto, test.value(), toto]'
The string "toto, toto<titi, tutu>&, titi" should return this list '[toto, toto<titi, tutu>&, titi]'
For now, I wrote this regex to match those commas
',(?![^(]*\))(?![^<>]*\>)'
but it doesn't produce the good result for the first example.
Someone have an idea?
Thanks in advance !
I created a pattern, that matches groups seperated by comma instead of trying to match the comma. Ergo, the java code does not split by the seperator, but instead enlists all matching groups:
@RunWith(Parameterized.class)
public class RegexTest {
private final String testString;
private final Collection<String> expectedResult;
public RegexTest(String testString, String[] expectedResult) {
this.testString = testString;
this.expectedResult = Arrays.asList(expectedResult);
}
private Collection<String> findMatchedWords(String sentence) {
Pattern pattern = Pattern.compile("((\\<.*?\\>|\\(.*?\\)|[^, ])+)");
Matcher matcher = pattern.matcher(sentence);
List<String> matches = new ArrayList<>();
while(matcher.find()){
matches.add(matcher.group());
}
return matches;
}
@Test
public void testPattern() {
Collection<String> actualResult = findMatchedWords(testString);
TestCase.assertEquals(expectedResult, actualResult);
}
@Parameters
public static Iterable<?> getTestParamters() {
Object[][] parameters = {
{"test.toto, test.value(), toto", new String[] { "test.toto", "test.value()", "toto" }},
{"test.toto, test->toto.value(), sizeof(test, toto)", new String[] { "test.toto", "test->toto.value()", "sizeof(test, toto)" }},
{"toto, toto<titi, tutu>&, titi", new String[] { "toto", "toto<titi, tutu>&", "titi" }}
};
return Arrays.asList(parameters);
}
}
EDIT: I've misread the OP example containing < and >, but it's fixed.
I wrote this method that do the job
public static List<String> splitByUpperComma(String toSplit) {
int parenthesisCount = 0;
boolean innerChevron = false;
int pos = 0;
ArrayList<Integer> indexes = new ArrayList<Integer>();
for (char currentChar : toSplit.toCharArray()) {
if (currentChar == '(') {
parenthesisCount++;
} else if (currentChar == ')') {
parenthesisCount--;
} else if (currentChar == '<') {
innerChevron = true;
} else if (currentChar == '>') {
innerChevron = false;
} else if (currentChar == ',' && !innerChevron && parenthesisCount == 0) {
indexes.add(pos);
}
pos++;
}
ArrayList<String> splittedString = new ArrayList<String>();
int previousIndex = 0;
for (Integer idx : indexes) {
splittedString.add(toSplit.substring(previousIndex, idx));
previousIndex = idx + 1;
}
splittedString.add(toSplit.substring(previousIndex, toSplit.length()));
return splittedString;
}
But it's not a regex..
I can't check it because I'm not on a computer, but give this a try:
(?:[,]?)([^,]*([(<].*?[)>])?[^,]*)
You may have to escape the parenthesis in brackets.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.