简体   繁体   English

当文本用双引号括起时,转义文本中的特殊字符

[英]Escape special characters in a text when text is either enclosed in double quotes or not

I am writing a regex to escape a few special characters including double quote from the input. 我正在写一个正则表达式来逃避一些特殊字符,包括输入的双引号。

input can be enclosed in double quotes and those should be not escaped. 输入可以用双引号括起来,那些不应该被转义。

Ex of input : 输入结果:

"te(st", te(st, te"st 

expected outputs : 预期产出:

"te\(st", te\(st, te\"st

Code used : 使用的代码:

String regex = "^\".*\"$";
    String value = "\"strin'g\"";
    Pattern SPECIAL_REGEX_CHARS = Pattern.compile("[()'"\\[\\]*]");

    if (Pattern.matches(regex, value)){
        String val = value.substring(1, value.length() -1);
        String replaceAll = SPECIAL_REGEX_CHARS.matcher(val).replaceAll("\\\\$0");
        replaceAll = "\""+replaceAll+"\"";
        System.out.println(replaceAll);
    }else {
        String replaceAll = SPECIAL_REGEX_CHARS.matcher(value).replaceAll("\\\\$0");
        System.out.println(replaceAll);
    }

1 - checking if the text is enclosed in double quotes. 1 - 检查文本是否用双引号括起来。 if yes, escape the special characters in the text that is enclosed in double quotes. 如果是,则转义用双引号括起来的文本中的特殊字符。

2 - else . 2 - 别的。 escape special characters in the text. 转义文本中的特殊字符。

any regex expression which can combine #1 and #2 ? 任何可以组合#1和#2的正则表达式?

Regards, Anil 此致,Anil

Simple solution with one escaping regex only 只有一个转义正则表达式的简单解决方案

You may use the if (s.startsWith("\\"") && s.endsWith("\\"")) to check if a string has both leading and trailing " , and if it does, you can then trim out the leading and trailing " with replaceAll("^\\"|\\"$", "") , then escape using your escaping regex, and then add " back. Else, just escape the characters in your set. 您可以使用if (s.startsWith("\\"") && s.endsWith("\\""))来检查字符串是否同时包含前导和尾随" ,如果是,则可以删除前导并跟踪" with replaceAll("^\\"|\\"$", "") ,然后使用转义正则表达式转义,然后添加"返回。否则,只需转义集合中的字符。

String SPECIAL_REGEX_CHARS = "[()'\"\\[\\]*]";
String s = "\"te(st\""; // => "te\(st"
String result;
if (s.startsWith("\"") && s.endsWith("\"")) {
    result = "\"" + s.replaceAll("^\"|\"$", "").replaceAll(SPECIAL_REGEX_CHARS, "\\\\$0") + "\"";
}
else {
    result = s.replaceAll(SPECIAL_REGEX_CHARS, "\\\\$0");
}
System.out.println(result.toString());

See another IDEONE demo 请参阅另一个IDEONE演示

Alternative solution with appendReplacement "callback" 使用appendReplacement “回调”的替代解决方案

Here is how I would do that with one regex using an alternation: 以下是使用替换的一个正则表达式的方法:

String SPECIAL_REGEX_CHARS = "[()'\"\\[\\]*]";
//String s = "\"te(st\""; // => "te\(st"
//String s = "te(st"; // => te\(st
String s = "te\"st"; // => te\"st
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)\"(.*)\"|(.*)").matcher(s);
if (m.matches()) {
    if (m.group(1) == null) { // we have no quotes around
        m.appendReplacement(result, m.group(2).replaceAll(SPECIAL_REGEX_CHARS, "\\\\\\\\$0"));
    }
    else {
        m.appendReplacement(result, "\"" + m.group(1).replaceAll(SPECIAL_REGEX_CHARS, "\\\\\\\\$0") + "\"");
    }
}
m.appendTail(result);
System.out.println(result.toString());

See IDEONE demo 请参阅IDEONE演示

Main points: 要点:

  • The Matcher#addReplacement() with Matcher#appendTail() allow manipulating groups. Matcher#addReplacement()Matcher#appendTail()允许操作组。
  • Using (?s)\\"(.*)\\"|(.*) regex with 2 alternative branches: ".*" matching a string starting with " and ending with " (note that (?s) is a DOTALL inline modifier allowing matching strings with newline sequences) or a .* alternative just matching all other strings. 使用(?s)\\"(.*)\\"|(.*) :2个选择分支正则表达式".*"匹配的字符串开头"和结尾" (注意, (?s)是一个DOTALL在线修改允许匹配字符串与换行符序列)或.*替代匹配所有其他字符串。
  • If the 1st alternative is matched, we just replace the selected special characters in the first capture group, and then add the " on both ends. 如果第一个替代匹配,我们只需替换第一个捕获组中选定的特殊字符,然后添加"两端”。
  • If the second alternative is matched, just add the escaping symbol in the whole Group 2. 如果匹配第二个备选方案,则只需在整个第2组中添加转义符号。
  • To replace with a literal backslash, you need \\\\\\\\\\\\\\\\ in the replacement pattern. 要替换为文字反斜杠,您需要在替换模式中使用\\\\\\\\\\\\\\\\

You can use a negative lookbehind and lookahead : 你可以使用负面的lookbehind和lookahead

System.out.println(value.replaceAll("([()'\\[\\]*]|(?<!^)\"(?!$))", "\\\\$0"));

This is essentially saying: escape anything in character class [()'\\[\\]*] , or any " not preceded by beginning-of-string or followed by end-of-string. 这基本上是说:在字符类[()'\\[\\]*]转义任何内容,或者"不以字符串开头或后跟字符串结尾的任何内容"

The only catch is that a leading and trailing quote will be ignored regardless of whether it has a corresponding quote at the other end. 唯一的问题是,无论在另一端是否有相应的报价,都会忽略前导和尾随报价。 If that's a problem, you can chain these replacements to escape an unmatched leading or trailing quote: 如果这是一个问题,您可以链接这些替换以逃避不匹配的前导或尾随引用:

.replaceAll("^\".*[^\"]$", "\\\\$0")
.replaceAll("(^[^\"].*)(\"$)", "$1\\\\$2")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在js中转义双引号和其他特殊字符 - Escape double quotes and other special characters in js 硒-查找仅用双引号引起来的文本 - Selenium - Find text only enclosed by double quotes 用双引号引起来的转义序列与预定义字符类(又名特殊的正则表达式字符) - Escape sequences vs predefined character classes (aka special regex characters) when encapsulated by double quotes 需要转义字符串中的双精度或某些特殊字符 - Need to escape the double or some special characters in a String 用于拆分由|分隔的字符串的正则表达式 当没有用双引号括起来时 - Regex for splitting a string delimited by | when not enclosed on double quotes 文字中的特殊字符 - Special characters in text Groovy(或 Java):如何仅在 HTML 内部文本中转义双引号,而不是在属性中 - Groovy (or Java): How to escape double quotes only within HTML inner text, not in attributes 返回Java正则表达式(单词,空格,特殊字符,双引号) - Returning java regex (words, spaces, special characters, double quotes) 具有特殊字符的分割字符串,也没有双引号 - Split strings having special characters, also without double quotes 在超链接文本中显示特殊字符 - displaying special characters in hyperlink text
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM