简体   繁体   中英

Regular expression that matches “{$” AND NOT matches “\{$”

I am working on a project with lexical analysis and basically I have to generate tokens that are text and that are not text .

  • Tokens that are are considered all characters until the "{$" sequence. 令牌被视为所有字符。
  • Tokens that are are considered all characters inside the "{$" and "$}" sequences. 标记被视为"{$""$}"序列内的所有字符。

Note that the "{$" character sequence can be escaped by writing "\\{$" so this also becomes a part of text .
My job is to read a String of text , and for that I am using Regular expressions .

I am using the Java Scanner and Pattern classes and this is my work so far:

String text = "This is \\{$ just text$}\nThis is {$not_text$}."
Scanner sc = new Scanner(text);
Pattern textPattern = Pattern.compile("{\\$"); // insert working regex here
sc.useDelimiter(textPattern);

System.out.println(sc.next());

This is what should be printed out:

This is \\{$ just text$}
This is

How do I make a regex for the following logical statement:

"{$" "\\{$" “ {$” “ \\ {$”

You can use Negative Look-Behind (?<!\\\\) in front of \\{\\$ to ensure that escaped curly braces are not matched:

(?<!\\)\{\$

Demo

Possible solution:

String text = "This is \\{$ just text$}\nThis is {$not_text$}.";
Pattern textPattern = Pattern.compile(
          "(?<text>(?:\\\\.|(?!\\{\\$).)+)" // text - `\x` or non-start-of `{$`
        + "|"                        // OR
        + "(?<nonText>\\{\\$.*?\\$\\})");      // non-text
Matcher m = textPattern.matcher(text);
while (m.find()) {
    if (m.group(1)!=null){
        System.out.println("text : "+m.group("text"));
    }else{
        System.out.println("non-text : "+m.group("nonText"));
    }
}
System.out.println("\01234");

Explanation:

From what I see, you want \\ to be special character used for escaping.
Problem now is to determine where \\ is meant to escape character/sequence after it, and when it should be treated as simple printable character (literal).

(possible problem)
Lets say that you have text dir1\\dir2\\ and you want to add after it non-text foo . How would you write it?

You could try writing dir1\\dir2\\{$foo$} but this could mean that you just escaped {$ which would prevent foo from being seen as non-text.

In Java, String literals faced same problem since \\ can be used to create other special characters using

  • pairs \\n \\r \\t \\"
  • Unicode codepoints \￿
  • octal format \\012 .

Solution used in Java (and many other languages) was making \\ always special character which to create \\ literal required escaping it with another \\ (there was no real need to add yet another special character for that). So to represent \\ we need to write it as \\\\ .

So if we have text dir1\\dir2\\ we would need to write it as dir1\\\\dir2\\\\ . This would allow us to concatenate to it {$non-text$} without fear that this last \\\\ placed right before {$ will be causing misinterpretation of it and prevent seeing it as non-text sequence.

So now when we see dir1\\\\dir2\\\\{$foo$} we can interpret {$ properly.

From this point I am assuming you are also using this approach which ensures proper interpretation of \\ .

Now, lets try to create rule which will let us find/separate text and non-text characters.

Based on our example we know that dir1\\\\dir2\\\\{$foo$} is: text dir1\\\\dir2\\\\ and non-text {$foo$} .
So as you see splitting on {$ which is not preceded by \\ can fail you sometimes (if number of preceding \\ is not odd).

Probably simpler solution is to accept

  • for text:
    • \\\\. - regex representing characters which are preceded by \\ (this will handle \\\\ literal and escaped \\{ (which will also allow us to accept rest of $..$} part)
    • (?!\\{\\$). - regex representing character which isn't { which would start {$ area.
  • for non-text:
    • \\{\\$.*?\\$\\} - regex representing {$...$} - we know that it will be unescaped because all escaped characters will be accepted by \\\\. .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM