简体   繁体   中英

Regular expression to match escaped characters (quotes)

I want to build a simple regex that covers quoted strings, including any escaped quotes within them. For instance,

"This is valid"
"This is \" also \" valid"

Obviously, something like

"([^"]*)"

does not work, because it matches up to the first escaped quote.

What is the correct version?

I suppose the answer would be the same for other escaped characters (by just replacing the respective character).

By the way, I am aware of the "catch-all" regex

"(.*?)"

but I try to avoid it whenever possible, because, not surprisingly, it runs somewhat slower than a more specific one.

Here is one that I've used in the past:

("[^"\\]*(?:\\.[^"\\]*)*")

This will capture quoted strings, along with any escaped quote characters, and exclude anything that doesn't appear in enclosing quotes.

For example, the pattern will capture "This is valid" and "This is \\" also \\" valid" from this string:

"This is valid" this won't be captured "This is \" also \" valid"

This pattern will not match the string "I don't \\"have\\" a closing quote , and will allow for additional escape codes in the string (eg, it will match "hello world!\\n" ).

Of course, you'll have to escape the pattern to use it in your code, like so:

"(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")"

The problem with all the other answers is they only match for the initial obvious testing, but fall short to further scrutiny. For example, all of the answers expect that the very first quote will not be escaped. But most importantly, escaping is a more complex process than just a single backslash, because that backslash itself can be escaped. Imagine trying to actually match a string which ends with a backslash. How would that be possible?

This would be the pattern you are looking for. It doesn't assume that the first quote is the working one, and it will allow for backslashes to be escaped.

(?<!\\)(?:\\{2})*"(?:(?<!\\)(?:\\{2})*\\"|[^"])+(?<!\\)(?:\\{2})*"

Try this one... It prefers the \\" , if that matches, it will pick it, otherwise it will pick " .

"((?:\\"|[^"])*)"

Once you have matched the string, you'll need to take the first captured group's value and replace \\" with " .

Edit: Fixed grouping logic.

Please find in the below code comprising expression evaluation for , and .表达式评估的代码。

public static void commaSeparatedStrings() {        
    String value = "'It\\'s my world', 'Hello World', 'What\\'s up', 'It\\'s just what I expected.'";

    if (value.matches("'([^\'\\\\]*(?:\\\\.[^\'\\\\])*)[\\w\\s,\\.]+'(((,)|(,\\s))'([^\'\\\\]*(?:\\\\.[^\'\\\\])*)[\\w\\s,\\.]+')*")) {
        System.out.println("Valid...");
    } else {
        System.out.println("Invalid...");
    }
}

/**
 * 
 */
public static void commaSeparatedDecimals() {
    String value = "-111.00, 22111.00, -1.00";
    // "\\d+([,]|[,\\s]\\d+)*"
    if (value.matches(
            "^([-]?)\\d+\\.\\d{1,10}?(((,)|(,\\s))([-]?)\\d+\\.\\d{1,10}?)*")) {
        System.out.println("Valid...");
    } else {
        System.out.println("Invalid...");
    }
}

/**
 * 
 */
public static void commaSeparatedNumbers() {
    String value = "-11, 22, -31";      
    if (value.matches("^([-]?)\\d+(((,)|(,\\s))([-]?)\\d+)*")) {
        System.out.println("Valid...");
    } else {
        System.out.println("Invalid...");
    }
}

This

("((?:[^"\\])*(?:\\\")*(?:\\\\)*)*")

will capture all strings (within double quotes), including \\" and \\\\ escape sequences. (Note that this answer assumes that the only escape sequences in your string are \\" or \\\\ sequences -- no other backslash characters or escape sequences will be captured.)

("(?:         # begin with a quote and capture...
  (?:[^"\\])* # any non-\, non-" characters
  (?:\\\")*   # any combined \" sequences
  (?:\\\\)*   # and any combined \\ sequences
  )*          # any number of times
")            # then, close the string with a quote

Try it out here!

Also, note that maksymiuk's accepted answer contains an "edge case" ("Imagine trying to actually match a string which ends with a backslash") which is actually just a malformed string. Something like

"this\"

...is not a "string ending on a backslash", but an unclosed string ending on an escaped quotation mark. A string which truly ends on a backslash would look like

"this\\"

...and the above solution handles this case.


If you want to expand a bit, this...

(\\(?:b|t|n|f|r|\"|\\)|\\(?:(?:[0-2][0-9]{1,2}|3[0-6][0-9]|37[0-7]|[0-9]{1,2}))|\\(?:u(?:[0-9a-fA-F]{4})))

...captures all common escape sequences (including escaped quotes):

(\\                       # get the preceding slash (for each section)
  (?:b|t|n|f|r|\"|\\)     # capture common sequences like \n and \t

  |\\                     # OR (get the preceding slash and)...
  # capture variable-width octal escape sequences like \02, \13, or \377
  (?:(?:[0-2][0-9]{1,2}|3[0-6][0-9]|37[0-7]|[0-9]{1,2}))

  |\\                     # OR (get the preceding slash and)...
  (?:u(?:[0-9a-fA-F]{4})) # capture fixed-width Unicode sequences like \u0242 or \uFFAD
)

Seethis Gist for more information on the second point.

It works for me and it is simpler than current answer

(?<!\\+)"(\\"|[^"])*(?<!\\+)"

(?<!\\\\+) - before " not must be \\ , and this expression is left and right.

(\\\\"|[^"])* - that inside quotes: might be escaped quotes \\\\" or anything for except quotes [^"]

Current regexp work correctly for follow strings:

234 - false or null

"234" - true or ["234"]

"" - true or [""]

"234 + 321 \\\\"24\\\\"" - true or ["234 + 321 \\\\"24\\\\""]

"234 + 321 \\\\"24\\\\"" + 123 + "\\\\"test(\\\\"235\\\\")\\\\"" - true

or ["234 + 321 \\\\"24\\\\"", "\\\\"test(\\\\"235\\\\")\\\\""]

"234 + 321 \\\\"24\\\\"" + 123 + "\\\\"test(\\\\"235\\\\")\\\\"\\\\" - true

or ["234 + 321 \\\\"24\\\\""]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM