简体   繁体   中英

Regex, extract string not between symbols

I would like to split a text by ',' but not by ',' that are between parenthesis or chevron.

For example:

The string "test.toto, test->toto.value(), sizeof(test, toto)" should return this list '[test.toto, test->toto.value(), sizeof(test, toto)]'

The string "test.toto, test.value(), toto" should return this list '[test.toto, test.value(), toto]'

The string "toto, toto<titi, tutu>&, titi" should return this list '[toto, toto<titi, tutu>&, titi]'

For now, I wrote this regex to match those commas

',(?![^(]*\))(?![^<>]*\>)' 

but it doesn't produce the good result for the first example.

Someone have an idea?

Thanks in advance !

I created a pattern, that matches groups seperated by comma instead of trying to match the comma. Ergo, the java code does not split by the seperator, but instead enlists all matching groups:

@RunWith(Parameterized.class)
public class RegexTest {

    private final String testString;
    private final Collection<String> expectedResult;


    public RegexTest(String testString, String[] expectedResult) {
        this.testString = testString;
        this.expectedResult = Arrays.asList(expectedResult);
    }

    private Collection<String> findMatchedWords(String sentence) {
        Pattern pattern = Pattern.compile("((\\<.*?\\>|\\(.*?\\)|[^, ])+)");

        Matcher matcher = pattern.matcher(sentence);
        List<String> matches = new ArrayList<>();

        while(matcher.find()){
            matches.add(matcher.group());
        }
        return matches;
    }


    @Test
    public void testPattern() {         
        Collection<String> actualResult = findMatchedWords(testString);

        TestCase.assertEquals(expectedResult, actualResult);
    }


    @Parameters
    public static Iterable<?> getTestParamters() {
        Object[][] parameters = {
                {"test.toto, test.value(), toto", new String[]  { "test.toto", "test.value()", "toto" }},
                {"test.toto, test->toto.value(), sizeof(test, toto)", new String[] { "test.toto", "test->toto.value()", "sizeof(test, toto)" }},
                {"toto, toto<titi, tutu>&, titi", new String[]  { "toto", "toto<titi, tutu>&", "titi" }}
        };
        return Arrays.asList(parameters);
    }
}

EDIT: I've misread the OP example containing < and >, but it's fixed.

I wrote this method that do the job

public static List<String> splitByUpperComma(String toSplit) {
    int parenthesisCount = 0;
    boolean innerChevron = false;
    int pos = 0;
    ArrayList<Integer> indexes = new ArrayList<Integer>();

    for (char currentChar : toSplit.toCharArray()) {
        if (currentChar == '(') {
            parenthesisCount++;
        } else if (currentChar == ')') {
            parenthesisCount--;
        } else if (currentChar == '<') {
            innerChevron = true;
        } else if (currentChar == '>') {
            innerChevron = false;
        } else if (currentChar == ',' && !innerChevron && parenthesisCount == 0) {
            indexes.add(pos);
        }
        pos++;
    }

    ArrayList<String> splittedString = new ArrayList<String>();
    int previousIndex = 0;
    for (Integer idx : indexes) {
        splittedString.add(toSplit.substring(previousIndex, idx));
        previousIndex = idx + 1;
    }
    splittedString.add(toSplit.substring(previousIndex, toSplit.length()));

    return splittedString;
}

But it's not a regex..

I can't check it because I'm not on a computer, but give this a try:

(?:[,]?)([^,]*([(<].*?[)>])?[^,]*)

You may have to escape the parenthesis in brackets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM