简体   繁体   English

如何删除包含多个字符串和注释符号的行的行注释

[英]How do I strip a line comment of a line containing multiple strings and comment symbols

I'd like to parse KConf files which contain single line comments which are introduced by the # character. 我想解析包含#字符引入的单行注释的KConf文件。 Below you can find an example of such file. 您可以在下面找到此类文件的示例。

https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig

I am aware that the single line test string almost looks random though it should contain most if not all variants of nested hashes and strings and quotes in comments which do not introduce a string. 我知道单行测试字符串几乎看起来是随机的,尽管它应该包含嵌套散列和字符串的大多数(如果不是全部)变体,以及不引入字符串的注释中的引号。

The regex engine that I use currently the one in Groovy which is based Java. 我目前在基于Java的Groovy中使用的正则表达式引擎。

Test string 测试字符串

Lorem "ipsum # \\" dolor" sit amet, 'consectetur # \\' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.

Desired result 所需结果

Lorem "ipsum # \\" dolor" sit amet, 'consectetur # \\' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non

or (with the leading space) (前导空格)

#bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.

First of all, I've escaped your string so it can be stored as a variable using JavaScript (since you don't seem to indicate a language, I'll assume JS): 首先,我已经对您的字符串进行了转义,因此可以使用JavaScript将其存储为变量(因为您似乎没有指示语言,所以我假设使用JS):

var str = 'Lorem "ipsum # " dolor" sit amet, \\'consectetur # \\' adipiscing\\' elit. Maecenas \\'suscipit#mollis\\' quam, non #bibendum \\'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.';

To remove everything following a " " followed by a "#" which is not followed by a space: 要删除一切有一个“”,后跟一个“#”,它跟一个空格:

str.replace(/ #[^ ].*/, '');

Lastly, your second desired result makes absolutely no sense. 最后,您想要的第二个结果绝对没有任何意义。

All of this would of course be helped by a proper description. 所有这些当然都可以通过适当的描述得到帮助。

Based on the limited information, this regex might work. 根据有限的信息,此正则表达式可能有效。
It seems a little extra complex trying to distinquish embedded hashes from coments though. 但是,似乎很难将嵌入式散列与矩量区分开来。
Didn't have time to test it but cut'n pasted a few regex pieces. 没有时间进行测试,但剪切了一些正则表达式。
Note that it should be used in Multi-Line Mode . 请注意,它应在多行模式下使用 And Everything is geared to a line parse. 一切都适合于进行行解析。
Ie nothing in the regex will span lines. 即正则表达式中的任何内容都不会跨越行。

 #  (?-s)^(?:"[^"\\\n]*(?:\\.[^"\\\n]*)*"|'[^'\\\n]*(?:\\.[^'\\\n]*)*'|[^#"'\s]+|(?<=[^\s#])\#+|[^\S\n]+(?!\#))*(?:[^\S\n]+|^)(\#.*)$
 #  "(?-s)^(?:\"[^\"\\\\\\n]*(?:\\\\.[^\"\\\\\\n]*)*\"|'[^'\\\\\\n]*(?:\\\\.[^'\\\\\\n]*)*'|[^#\"'\\s]+|(?<=[^\\s#])\\#+|[^\\S\\n]+(?!\\#))*(?:[^\\S\\n]+|^)(\\#.*)$"

 (?-s)                   # Modifier, No dot all 
 ^                       # Beginning of line
 (?:
      "                       # Double quotes
      [^"\\\n]* 
      (?: \\ . [^"\\\n]* )*
      "
   |                        # or
      '                       # Single quotes
      [^'\\\n]* 
      (?: \\ . [^'\\\n]* )*
      '
   |                        # or
      [^#"'\s]+               # Not hash, quotes, whitespace
   |                        # or
      (?<= [^\s#] )           # Preceded by a character, but not hash or whitespace
      \#+                     # Embeded hashes
   |                        # or
      [^\S\n]+                # Whitespaces (non-newline)
      (?! \# )                # Not folowed by hash
 )*
 (?: [^\S\n]+ | ^ )      # Whitespaces  (non-newline) or BOL
 ( \# .* )               # (1), hash comment
 $                       # End of line

Raw regex: 原始正则表达式:

^((?:\\.|("|')(?:(?!\2|\\|[\r\n]).|\\.)*\2|[^#'"\r\n])+)#.+

Replace with $1 : 替换为$1

Example: 例:

String re = "^((?:\\\\.|(\"|')(?:(?!\\2|\\\\|[\\r\\n]).|\\\\.)*\\2|[^#'\"\\r\\n])+)#.+";
String line = "Lorem \"ipsum # \\\" dolor\" sit amet, 'consectetur # \\' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend \"in. Duis # convallis\" luctus nunc, ac luctus lectus dapibus at.";
String uncommented = line.replaceAll(re, "$1");

//=> Lorem "ipsum # \" dolor" sit amet, 'consectetur # \' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non

regex101 demo regex101演示

ideone demo ideone演示

Breakdown: 分解:

^                         # Beginning of line
  (                       # Beginning of 1st capture group
    (?:                   # Non-capture group 1
      \\.                 # Match an escaped character
    |
      ("|')               # Or, a quote (and capture it in 2nd capture group),
      (?:                 # Non-capture group 2
        (?!\2|\\|[\r\n]). # Followed by any character except relevant quote, \ or newline
      |
        \\.               # Or an escaped character
      )*                  # Close of non-capture group 2 and repeat as many times
      \2                  # Close the quoted part
    |
      [^#'"\r\n]          # Any non-hash, single/double quote, newline characters
    )+                    # Close of non-capture group 1 and repeat as many times
  )                       # Close capture group 1
  #.+                     # Match comments

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM