简体   繁体   English

正则表达式仅匹配未转义的特殊字符

[英]Regular expression to match unescaped special characters only

I'm trying to come up with a regular expression that can match only characters not preceded by a special escape sequence in a string. 我正在尝试提出一个正则表达式,它只能匹配字符串中没有特殊转义序列的字符。

For instance, in the string Is ? stranded//? 例如,在字符串中Is ? stranded//? Is ? stranded//? , I want to be able to replace the ? ,我希望能够取代? which hasn't been escaped with another string, so I can have this result : **Is Dave stranded?** 没有用另一个字符串进行转义,所以我可以得到这样的结果: **Is Dave stranded?**

But for the life of me I have not been able to figure out a way. 但对于我的生活,我一直无法找到方法。 I have only come up with regular expressions that eat all the replaceable characters. 我只提出了吃掉所有可替换字符的正则表达式。

How do you construct a regular expression that matches only characters not preceded by an escape sequence? 如何构造一个只匹配前面没有转义序列的字符的正则表达式?

Use a negative lookbehind, it's what they were designed to do! 使用负面的外观,这是他们的目的!

(?<!//)[\\?] (?<!//)[\\?]

To brake it down: 制动它:

(
    ?<!    #The negative look behind.  It will check that the following slashes do not exist.
    //     #The slashes you are trying to avoid.
)
[\?]       #Your special charactor list.

Only if the // cannot be found, it will progress with the rest of the search. 只有在找不到//时,它才会在搜索的其余部分进行。

I think in Java it will need to be escaped again as a string something like: 我认为在Java中它需要再次转义为字符串,例如:

Pattern p = Pattern.compile("(?<!//)[\\?]");

Try this Java code: 试试这个Java代码:

str="Is ? stranded//?";
Pattern p = Pattern.compile("(?<!//)([?])");
m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, m.group(1).replace("?", "Dave"));
}
m.appendTail(sb);
String s = sb.toString().replace("//", "");
System.out.println("Output: " + s);

OUTPUT OUTPUT

Output: Is Dave stranded?

I was thinking about this and have a second simplier solution, avoiding regexs. 我正在考虑这个,并有第二个更简单的解决方案,避免正则表达式。 The other answers are probably better but I thought I might post it anyway. 其他答案可能更好,但我想我可能会发布它。

String input = "Is ? stranded//?"; 
String output = input
    .replace("//?", "a717efbc-84a9-46bf-b1be-8a9fb714fce8")
    .replace("?", "Dave")
    .replace("a717efbc-84a9-46bf-b1be-8a9fb714fce8", "?");

Just protect the "//?" 只是保护“//?” by replacing it with something unique (like a guid). 用一些独特的东西(如guid)替换它。 Then you know any remaining question marks are fair game. 然后你知道任何剩下的问号都是合理的游戏。

Use grouping. 使用分组。 Here's one example: 这是一个例子:

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("([^/][^/])(\\?)");
        String s = "Is ? stranded//?";
        Matcher m = p.matcher(s);
        if (m.matches)
            s = m.replaceAll("$1XXX").replace("//", "");
        System.out.println(s + " -> " + s);
    }
}

Output: 输出:

$ java Test
Is ? stranded//? -> Is XXX stranded?

In this example, I'm: 在这个例子中,我是:

  • first replacing any non-escaped ? 首先替换任何非逃脱? with "XXX", 用“XXX”,
  • then, removing the "//" escape sequences. 然后,删除“//”转义序列。

EDIT Use if (m.matches) to ensure that you handle non-matching strings properly. 编辑使用if (m.matches)确保正确处理不匹配的字符串。

This is just a quick-and-dirty example. 这只是一个快速而肮脏的例子。 You need to flesh it out, obviously, to make it more robust. 显然,你需要充实它,使其更加强大。 But it gets the general idea across. 但它得到了一般的想法。

I used this one: 我用过这个:

((?:^|[^\\])(?:\\\\)*[ESCAPABLE CHARACTERS HERE])

Demo: https://regex101.com/r/zH1zO3/4 演示: https//regex101.com/r/zH1zO3/4

Match on a set of characters OTHER than an escape sequence, then a regex special character. 匹配除转义序列之外的一组字符,然后是正则表达式特殊字符。 You could use an inverted character class ( [^/] ) for the first bit. 您可以使用倒置字符类( [^/] )作为第一位。 Special case an unescaped regex character at the front of the string. 特殊情况下,字符串前面有一个未转义的正则表达式字符。

尝试匹配:

(^|(^.)|(.[^/])|([^/].))[special characters list]
String aString = "Is ? stranded//?";

String regex = "(?<!//)[^a-z^A-Z^\\s^/]";
System.out.println(aString.replaceAll(regex, "Dave"));

The part of the regular expression [^az^AZ^\\\\s^/] matches non-alphanumeric, whitespace or non-forward slash charaters. 正则表达式的一部分[^az^AZ^\\\\s^/]匹配非字母数字,空格或非正斜杠字符。

The (?<!//) part does a negative lookbehind - see docco here for more info (?<!//)部分做了负面的观察 - 请参阅docco以获取更多信息

This gives the output Is Dave stranded//? 这给出了输出Is Dave stranded//?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM