简体   繁体   English

转义 Java 正则表达式中的特殊字符

[英]Escaping special characters in Java Regular Expressions

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression? Java 或任何开源库中是否有任何方法可以转义(不引用)特殊字符(元字符),以便将其用作正则表达式?

This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.这在动态构建正则表达式时非常方便,而无需手动转义每个单独的字符。

For example, consider a simple regex like \\d+\\.\\d+ that matches numbers with a decimal point like 1.2 , as well as the following code:例如,考虑一个简单的正则表达式,如\\d+\\.\\d+ ,它匹配带有小数点的数字,如1.2 ,以及以下代码:

String digit = "d";
String point = ".";
String regex1 = "\\d+\\.\\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");

Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);

System.out.println("Regex 1: " + regex1);

if (numbers1.matcher("1.2").matches()) {
    System.out.println("\tMatch");
} else {
    System.out.println("\tNo match");
}

System.out.println("Regex 2: " + regex2);

if (numbers2.matcher("1.2").matches()) {
    System.out.println("\tMatch");
} else {
    System.out.println("\tNo match");
}

Not surprisingly, the output produced by the above code is:毫不奇怪,上述代码产生的输出是:

Regex 1: \d+\.\d+
    Match
Regex 2: \Qd+.d+\E
    No match

That is, regex1 matches 1.2 but regex2 (which is "dynamically" built) does not (instead, it matches the literal string d+.d+ ).也就是说, regex1匹配1.2regex2 (它是“动态”构建的)不匹配(相反,它匹配文字字符串d+.d+ )。

So, is there a method that would automatically escape each regex meta-character?那么,是否有一种方法可以自动转义每个正则表达式元字符?

If there were, let's say, a static escape() method in java.util.regex.Pattern , the output of如果在java.util.regex.Pattern有一个静态的escape()方法,那么输出

Pattern.escape('.')

would be the string "\\."将是字符串"\\." , but , 但

Pattern.escape(',')

should just produce "," , since it is not a meta-character.应该只产生"," ,因为它不是元字符。 Similarly,相似地,

Pattern.escape('d')

could produce "\\d" , since 'd' is used to denote digits (although escaping may not make sense in this case, as 'd' could mean literal 'd' , which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.' ).可能会产生"\\d" ,因为'd'用于表示数字(尽管在这种情况下转义可能没有意义,因为'd'可能意味着文字'd' ,这不会被正则表达式解释器误解为其他东西,就像'.' )。

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression? Java 或任何开源库中是否有任何方法可以转义(不引用)特殊字符(元字符),以便将其用作正则表达式?

If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\\\\" should work but there is no nice Pattern.escape('.') function to help with this.如果您正在寻找一种方法来创建可以在正则表达式模式中使用的常量,那么只需在它们前面加上"\\\\"可以了,但是没有很好的Pattern.escape('.')函数来帮助解决这个问题。

So if you are trying to match "\\\\d" (the string \\d instead of a decimal character) then you would do:因此,如果您尝试匹配"\\\\d" (字符串\\d而不是十进制字符),那么您将执行以下操作:

// this will match on \d as opposed to a decimal character
String matchBackslashD = "\\\\d";
// as opposed to
String matchDecimalDigit = "\\d";

The 4 slashes in the Java string turn into 2 slashes in the regex pattern. Java 字符串中的 4 个斜杠在正则表达式模式中变成了 2 个斜杠。 2 backslashes in a regex pattern matches the backslash itself.正则表达式模式中的 2 个反斜杠与反斜杠本身匹配。 Prepending any special character with backslash turns it into a normal character instead of a special one.在任何特殊字符前加上反斜杠会将其变成普通字符而不是特殊字符。

matchPeriod = "\\.";
matchPlus = "\\+";
matchParens = "\\(\\)";
... 

In your post you use the Pattern.quote(string) method .在您的帖子中,您使用Pattern.quote(string)方法 This method wraps your pattern between "\\\\Q" and "\\\\E" so you can match a string even if it happens to have a special regex character in it ( + , . , \\\\d , etc.)此方法将您的模式包装在"\\\\Q""\\\\E"因此您可以匹配字符串,即使它碰巧有一个特殊的正则表达式字符( +.\\\\d等)

I wrote this pattern:我写了这个模式:

Pattern SPECIAL_REGEX_CHARS = Pattern.compile("[{}()\\[\\].+*?^$\\\\|]");

And use it in this method:并在此方法中使用它:

String escapeSpecialRegexChars(String str) {

    return SPECIAL_REGEX_CHARS.matcher(str).replaceAll("\\\\$0");
}

Then you can use it like this, for example:然后你可以像这样使用它,例如:

Pattern toSafePattern(String text)
{
    return Pattern.compile(".*" + escapeSpecialRegexChars(text) + ".*");
}

We needed to do that because, after escaping, we add some regex expressions.我们需要这样做,因为在转义之后,我们添加了一些正则表达式。 If not, you can simply use \\Q and \\E :如果没有,您可以简单地使用\\Q\\E

Pattern toSafePattern(String text)
{
    return Pattern.compile(".*\\Q" + text + "\\E.*")
}

The only way the regex matcher knows you are looking for a digit and not the letter d is to escape the letter ( \\d ).正则表达式匹配器知道您正在寻找数字而不是字母d的唯一方法是对字母 ( \\d ) 进行转义。 To type the regex escape character in java, you need to escape it (so \\ becomes \\\\ ).要在 java 中键入正则表达式转义字符,您需要对其进行转义(因此\\变为\\\\ )。 So, there's no way around typing double backslashes for special regex chars.因此,无法为特殊的正则表达式字符键入双反斜杠。

The Pattern.quote(String s) sort of does what you want. Pattern.quote(String s)排序做你想要的。 However it leaves a little left to be desired;然而,它还有一点不足之处; it doesn't actually escape the individual characters, just wraps the string with \\Q...\\E .它实际上并没有转义单个字符,只是用\\Q...\\E包装字符串。

There is not a method that does exactly what you are looking for, but the good news is that it is actually fairly simple to escape all of the special characters in a Java regular expression:没有一种方法可以完全满足您的要求,但好消息是转义 Java 正则表达式中的所有特殊字符实际上相当简单:

regex.replaceAll("[\\W]", "\\\\$0")

Why does this work?为什么这样做? Well, the documentation for Pattern specifically says that its permissible to escape non-alphabetic characters that don't necessarily have to be escaped:好吧, Pattern的文档特别指出,它允许转义不一定要转义的非字母字符:

It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct;在任何不表示转义结构的字母字符之前使用反斜杠是错误的; these are reserved for future extensions to the regular-expression language.这些保留用于未来对正则表达式语言的扩展。 A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.可以在非字母字符之前使用反斜杠,而不管该字符是否是未转义结构的一部分。

For example, ;例如, ; is not a special character in a regular expression.不是正则表达式中的特殊字符。 However, if you escape it, Pattern will still interpret \\;但是,如果你转义它, Pattern仍然会解释\\; as ;作为; . . Here are a few more examples:以下是更多示例:

  • > becomes \\> which is equivalent to > >变成\\>相当于>
  • [ becomes \\[ which is the escaped form of [ [\\[这是转义形式[
  • 8 is still 8 . 8还是8
  • \\) becomes \\\\\\) which is the escaped forms of \\ and ( concatenated. \\)变为\\\\\\) ,这是\\(连接的转义形式。

Note: The key is is the definition of "non-alphabetic", which in the documentation really means "non- word " characters, or characters outside the character set [a-zA-Z_0-9] .注意:关键是“non-alphabetic”的定义,在文档中真正的意思是“非单词”字符,或者字符集[a-zA-Z_0-9]之外的字符。

Use this Utility function escapeQuotes() in order to escape strings in between Groups and Sets of a RegualrExpression .使用此实用程序函数escapeQuotes()来转义RegualrExpression GroupsSets之间的字符串。

List of Regex Literals to escape <([{\\^-=$!|]})?*+.>要转义的正则表达式文字列表<([{\\^-=$!|]})?*+.>

public class RegexUtils {
    static String escapeChars = "\\.?![]{}()<>*+-=^$|";
    public static String escapeQuotes(String str) {
        if(str != null && str.length() > 0) {
            return str.replaceAll("[\\W]", "\\\\$0"); // \W designates non-word characters
        }
        return "";
    }
}

From the Pattern class the backslash character ('\\') serves to introduce escaped constructs.Pattern类中,反斜杠字符('\\')用于引入转义结构。 The string literal "\\(hello\\)" is illegal and leads to a compile-time error;字符串文字"\\(hello\\)"是非法的,会导致编译时错误; in order to match the string (hello) the string literal "\\\\(hello\\\\)" must be used.为了匹配字符串 (hello),必须使用字符串文字"\\\\(hello\\\\)"

Example : String to be matched (hello) and the regex with a group is (\\(hello\\)) .示例:要匹配的字符串(hello)和带组的正则表达式是(\\(hello\\)) Form here you only need to escape matched string as shown below.在这里形成你只需要转义匹配的字符串,如下所示。 Test Regex online

public static void main(String[] args) {
    String matched = "(hello)", regexExpGrup = "(" + escapeQuotes(matched) + ")";
    System.out.println("Regex : "+ regexExpGrup); // (\(hello\))
}

Agree with Gray, as you may need your pattern to have both litrals (\\[, \\]) and meta-characters ([, ]).同意 Gray,因为您可能需要您的模式同时包含字面字符(\\[, \\])和元字符([, ])。 so with some utility you should be able to escape all character first and then you can add meta-characters you want to add on same pattern.因此,使用某些实用程序,您应该能够首先转义所有字符,然后您可以添加要添加到相同模式的元字符。

use采用

pattern.compile("\"");
String s= p.toString()+"yourcontent"+p.toString();

will give result as yourcontent as is将按yourcontent给出结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM