简体   繁体   English

使用正则表达式匹配字符串中的参数值

[英]Match parameter values in string using regex

I have a string containing several parameters, eg 我有一个包含几个参数的字符串,例如

PARAM1="someValue", PARAM2="someOtherValue"...

For log-output I want to "hide" some of the parameter's values, ie replace them with ***. 对于日志输出,我想“隐藏”参数的某些值,即用***替换它们。

I use the following regex to match the parameter value, which works fine for most cases: 我使用以下正则表达式来匹配参数值,在大多数情况下都能正常工作:

(PARMANAME=")[\w\s]*"

However, this regex only matches word- and whitespace-characters. 但是,此正则表达式仅匹配单词和空格字符。 I want to extend it to match all characters between the two quotation marks. 我想扩展它以匹配两个引号之间的所有字符。 The problem is, that the value itself can contain (escaped) quotation marks, eg: 问题在于,值本身可以包含(转义)引号,例如:

PARAM="the name of this param is \"param\""

How can I match (and replace) that correctly? 如何正确匹配(并替换)?

My Java-method looks like this: 我的Java方法如下所示:

/**
 * @param input input string
 * @param params list of parameters to hide
 * @return string with the value of the parameter being replace by ***
 */
public static String hideParamValue(String input, final String... params)
{
    for (String param : params)
    {
        input = input.replaceAll("(" + param + "=)\\\"[\\w\\s]*\\\"", "$1***");
    }
    return input;
}

Escaped quotes are a real PITA in Java, but this should do the trick: 转义引号是Java中真正的PITA,但这应该可以解决问题:

public class Test
{
  public static String hideParamValue(String input, final String... params)
  {
    for (String param : params)
    {
      input = input.replaceAll(
        "(" + param + "=)\"(?:[^\"\\\\]|\\\\.)*\"",
        "$1***");
    }
    return input;
  }

  public static void main(String[] args)
  {
    String s = "PARAM1=\"a b c\", PARAM2=\"d \\\"e\\\" f\", PARAM3=\"g h i\"";
    System.out.println(s);
    System.out.println(hideParamValue(s, "PARAM2", "PARAM3"));
  }
}

output: 输出:

PARAM1="a b c", PARAM2="d \"e\" f", PARAM3="g h i"
PARAM1="a b c", PARAM2=***, PARAM3=***

[^\\"\\\\\\\\] matches any one character other than a quotation mark or a backslash. The backslash has to be escaped with another backslash for the regex, then each of those has to be escaped for the string literal. But the quotation mark has no special meaning in a regex, so it only needs one backslash. [^\\"\\\\\\\\]匹配除引号或反斜杠之外的任何一个字符。反斜杠必须用正则表达式的另一个反斜杠进行转义,然后必须对字符串字面量进行转义。在正则表达式中,引号没有特殊含义,因此只需要一个反斜杠即可。

(?:[^\\"\\\\\\\\]|\\\\\\\\.) matches anything except a quotation mark or a backslash, OR a backslash followed by anything. That takes care of your escaped quotation marks, and also allows for escaped backslashes and other escape sequences, at no extra cost. (?:[^\\"\\\\\\\\]|\\\\\\\\.)匹配除引号或反斜杠或反斜杠后跟其他任何字符之外的所有字符,这样可以解决转义的引号,并允许转义反斜杠和其他转义序列,无需额外费用。

The negative-lookbehind approach suggested by @axtavt only handles escaped quotes, and it treats \\\\" as a backslash followed by an escaped quote, when it was probably intended as an escaped backslash followed by a quote. @axtavt建议的负向隐藏方法仅处理转义引号,并且将\\\\"视为反斜杠,然后再进行转义引号,而可能是将其作为转义反斜杠后再加上引号。

Try this regular expression: 试试这个正则表达式:

PARAM="(?:[^"\\]|\\")*"

This only a allows a sequence of either any character except " and \\ or a \\" . 这仅允许一个除"\\\\"以外的任何字符的序列。 If you want to allow other escape sequences than just \\" , you can extend it with \\\\["rnt…] for example to also allow \\r , \\n , \\t , etc. 如果要允许除\\"以外的其他转义序列,则可以使用\\\\["rnt…]进行扩展,例如也允许\\r\\n\\t等。

You have to add the scaped double quotes to your mathing characters expression: 您必须在数学字符表达式中添加转义的双引号:

[\\w\\s\\\\"] instead of [\\w\\s] which escaped in your String will result as [\\\\w\\\\s\\\\\\\\\\"] instead of [\\\\w\\\\s] [\\w\\s\\\\"]而不是[\\w\\s]在字符串中转义的结果将是[\\\\w\\\\s\\\\\\\\\\"]而不是[\\\\w\\\\s]

Thus, the final code will result as 因此,最终代码将是

/**
 * @param input input string
 * @param params list of parameters to hide
 * @return string with the value of the parameter being replace by ***
 */
public static String hideParamValue(String input, final String... params) {
    for (String param : params)
    {
        input = input.replaceAll("(" + param + "=)\\\"[\\w\\s\\\\\"]*\\\"", "$1***");
    }
    return input;
}

A negative lookbehind may be useful in this case: 在这种情况下, 向后看负面的

(PARAMNAME=").*?(?<!\\)"

that is 那是

s.replaceAll("(" + param + "=)\".*?(?<!\\\\)\"", "$1***");

(?<!\\\\)" means " not preceded by \\ , so that .*?(?<!\\\\)" means the shortest possible (due to reluctant *? ) sequence of any characters terminated by " where " is not preceded by \\ . (?<!\\\\)"表示"不以\\ ,因此.*?(?<!\\\\)"表示以" where where not "结尾的所有字符中可能的最短序列(由于不愿意 *? )以\\

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM