简体   繁体   中英

Regex to match words between single or double quotes in a string

I'm looking for the correct regex to provide me the following results:

  • it needs to group words surrounded by single/double quote
  • it needs to keep printing the single quote when there's no other single quote in the string
  • when not surrounded by single/double quotes - split on space

I currently have:

Pattern pattern = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");

... but the following examples are not completely working. Who can help me with this one?

Examples:

  • foo bar
    • group1: foo
    • group2: bar
    • description: split on space
  • "foo bar"
    • group1: foo bar
    • description: surrounded by double quotes so group foo and bar, but don't print double quotes
  • 'foo bar'
    • group1: foo bar
    • description: same as above, but with single quotes
  • 'foo bar
    • group1: 'foo
    • group2: bar
    • description: split on space and keep single quote
  • "'foo bar"
    • group1: 'foo bar
    • description: surrounded by double quotes so group 'foo and bar and keep single quote
  • foo bar'
    • group1: foo
    • group2: bar'
  • foo bar"
    • group1: foo
    • group2: bar"
  • "foo bar" "stack overflow"
    • group1: foo bar
    • group2: stack overflow
  • "foo' bar" "stack overflow" how do you do
    • group1: foo' bar
    • group2: stack overflow
    • group3: how
    • group4: do
    • group5: you
    • group6: do

I'm not sure if you can do this in one Matcher.match call, but you can do it with a loop.
This code piece solves all the cases you mention above by using Matcher.find() repeatedly:

Pattern pattern = Pattern.compile("\"([^\"]+)\"|'([^']+)'|\\S+");
List<String> testStrings = Arrays.asList("foo bar", "\"foo bar\"","'foo bar'", "'foo bar", "\"'foo bar\"", "foo bar'", "foo bar\"", "\"foo bar\" \"stack overflow\"", "\"foo' bar\" \"stack overflow\" how do you do");
for (String testString : testStrings) {
    int count = 1;
    Matcher matcher = pattern.matcher(testString);
    System.out.format("* %s%n", testString);
    while (matcher.find()) {
        System.out.format("\t* group%d: %s%n", count++, matcher.group(1) == null ? matcher.group(2) == null ? matcher.group() : matcher.group(2) : matcher.group(1));
    }
}

This prints:

* foo bar
    * group1: foo
    * group2: bar
* "foo bar"
    * group1: foo bar
* 'foo bar'
    * group1: foo bar
* 'foo bar
    * group1: 'foo
    * group2: bar
* "'foo bar"
    * group1: 'foo bar
* foo bar'
    * group1: foo
    * group2: bar'
* foo bar"
    * group1: foo
    * group2: bar"
* "foo bar" "stack overflow"
    * group1: foo bar
    * group2: stack overflow
* "foo' bar" "stack overflow" how do you do
    * group1: foo' bar
    * group2: stack overflow
    * group3: how
    * group4: do
    * group5: you
    * group6: do

Anytime you have pairings (let it be quotes, or braces) you leave the realm of regex and go into the realm of grammar, which need parsers.

I'll leave you with the ultimate answer to this question

UPDATE:

A little more explanation.

A grammar is usually expressed as:

construct -> [set of constructs or terminals]

For example, for quotes

doblequotedstring := " simplequotedstring "
simplequotedstring := string ' string
                      | string '
                      | ' string
                      | '

This is a simple example; there will be proper examples of grammars for quoting in the internet.

I have used aflex and ajacc for this (for Ada; in Java exist jflex and jjacc). You pass the list of identifiers to aflex, generate an output, pass that output and the grammar to ajacc and you get an Ada parser. Since it has been a lot of time since I used them, I do not know if there are more streamlined solutions but in the basic it will need the same input.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM