简体   繁体   English

如何使用正则表达式来匹配不包含特殊字符(&,\\,<,>,|,)的字符串,除非它们以反斜杠进行

[英]How to use regex to match strings that don't contain special characters (&, \, <, >, |, ) unless they are proceeded by a backslash

Right now I am using [^ \\\\&<>|\\t\\n]+ which will match any string that contains characters that are not a space, \\, &, <, >, |, \\t, \\n. 现在我正在使用[^ \\\\&<>|\\t\\n]+ ,它将匹配任何包含非空格字符的字符串,\\,&,<,>,|,\\ t,\\ n。 What I want to do is also allow you to escape any of these special characters so that (for example) \\< or \\& would still allow my entire string to be matched. 我想要做的是允许你转义任何这些特殊字符,以便(例如)\\ <或\\&仍然允许我的整个字符串匹配。

Should match: 应该匹配:

abcdefghijk abcdef\\&hdehud\\<jdow\\\\

Should not match: 不应该匹配:

abcdefhfh&kdjeid abcdjedje\\idwjdj

I found this pattern ([^\\[]|(?<=\\\\)\\[)+ which does the same thing for just the "[" character. 我发现这个模式([^\\[]|(?<=\\\\)\\[)+只对“[”字符做同样的事情。 I couldn't figure out how to extend this to apply to any additional characters. 我无法弄清楚如何扩展它以适用于任何其他角色。

Any idea how I can make the exception for characters preceded by a backslash? 知道如何为反斜杠前面的字符设置例外吗?

If it makes any difference, I'm using this in Flex and C++ to tokenize a string for a shell. 如果它有所不同,我在Flex和C ++中使用它来为shell标记字符串。 I believe I need to use negative look-behinds but I don't know how to do that with multiple characters. 我相信我需要使用负面的后卫,但我不知道如何使用多个角色。

You are already most of the way to the answer: 你已经回答了大部分问题:

You are using the negated set [^ \\\\&<>|\\t\\n] to specifiy which characters may not be present, so all you have to do is then use the same set without the negation preceded by a \\ to escape the character. 你正在使用否定集[^ \\\\&<>|\\t\\n]来指定哪些字符可能不存在,所以你所要做的就是使用相同的集合,而没有以\\前面的否定来逃避字符。 That gets you this \\\\[ \\\\&<>|\\t\\n] which can be read as "a \\ followed by any one of the items in the set" now combine the two and you get ([^ \\\\&<>|\\t\\n]|\\\\[ \\\\&<>|\\t\\n])+ . 这可以让你得到这个\\\\[ \\\\&<>|\\t\\n] ,它可以被读作“一个\\后面跟着集合中的任何一个项目”现在结合这两个你就得到了([^ \\\\&<>|\\t\\n]|\\\\[ \\\\&<>|\\t\\n])+

To break it down: 要打破它:

One or more of: [^ \\\\&<>|\\t\\n] or \\\\[ \\\\&<>|\\t\\n] 以下一项或多项: [^ \\\\&<>|\\t\\n]\\\\[ \\\\&<>|\\t\\n]

As usual, using a regular expression here is overkill. 像往常一样,在这里使用正则表达式是过度的。 This is a simple text search: 这是一个简单的文本搜索:

const std::string target = "\\&<>|";
std::string iter = str.find_first_of(target);
while (iter != str.end()) {
    if (*iter != '\\')
        found_bad_character(*iter);
    iter = str.find_first_of(target, std::next(iter));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM