简体   繁体   English

现货? 正则表达式字符串中的字符文字

[英]Spot ? character literals in a regular expression string

I'm building a developer tool, and in one input field my users can input regular expressions. 我正在构建一个开发人员工具,并且我的用户可以在一个输入字段中输入正则表达式。

If they enter an expression that tries to match a literal ? 如果输入的表达式试图匹配文字? character anywhere then they've probably made a mistake, as I know that ? 我知道那他们在任何地方都可能犯错了? specifically is guaranteed to never appear in the string to match (and if they're trying to spot one, then there's a different action they should take instead). 特别是保证永远不会出现在匹配的字符串中(如果他们试图发现一个,那么他们应该采取另一种动作)。 I would like to show a warning in that case. 在这种情况下,我想显示警告。

How can I quickly check from a string containing a regular expression whether it contains a literal ? 如何从包含正则表达式的字符串中快速检查其是否包含文字? character? 字符? Eg I want to warn about regular expression strings like hello\\? 例如,我想警告像hello\\?这样的正则表达式字符串hello\\? , but not https? ,但不是https? .

Detecting \\? 正在检测\\? is probably a good start, but I imagine there's other cases too. 也许是一个不错的开始,但我想还有其他情况。

I'm building this in JavaScript. 我正在用JavaScript构建它。 Solutions based on simple string processing are preferable to fully parsing the regular expression, if possible. 如果可能,基于简单字符串处理的解决方案比完全解析正则表达式更可取。

Consider using an existing Regular Expression parser which outputs an AST . 考虑使用现有的输出AST的正则表达式解析器。

For example for JavaScript: 以JavaScript为例:
https://www.npmjs.com/package/regjsparser https://www.npmjs.com/package/regjsparser
https://github.com/jviereck/regjsparser https://github.com/jviereck/regjsparser

The demo page here allows you to see the generated AST: 此处的演示页面可让您查看生成的AST:
http://www.julianviereck.de/regjsparser/ http://www.julianviereck.de/regjsparser/

Then you could look through the "codePoint" (63) in the AST: 然后,您可以查看AST中的“ codePoint”(63):

{
      "type": "value",
      "kind": "identifier",
      "codePoint": 63,
      "range": [
        15,
        17
      ],
      "raw": "\\?"
    }

Also note that "characterClassRange" types might also include your "?" 另请注意,“ characterClassRange”类型也可能包含您的“?” character in it's range, the following includes a range of characters including "?" 范围内的字符,以下包括“?”在内的一系列字符 (63): http://www.julianviereck.de/regjsparser/#%2F%5B%5Cu003e-%5Cu0040%5D%2Fiu (63): http : //www.julianviereck.de/regjsparser/#%2F%5B%5Cu003e-%5Cu0040%5D%2Fiu

You could check the "codePoint" range between min and max for your character. 您可以检查字符的minmax之间的“ codePoint”范围。

{
      "type": "characterClassRange",
      "min": {
        "type": "value",
        "kind": "unicodeEscape",
        "codePoint": 62,
        "range": [
          1,
          7
        ],
        "raw": "\\u003e"
      },
      "max": {
        "type": "value",
        "kind": "unicodeEscape",
        "codePoint": 64,
        "range": [
          8,
          14
        ],
        "raw": "\\u0040"
      },
      "range": [
        1,
        14
      ],
      "raw": "\\u003e-\\u0040"
    }

Obviously check other test cases for other "types" that might include your character, but generally using an AST to perform these checks will improve how you "catch" them ("Gotta Catch 'Em All"). 显然检查其他测试用例是否可能包含您的角色的其他“类型”,但是通常使用AST来执行这些检查将改善您“捕获”它们的方式(“全部捕获全部”)。

Also note there is a JS library to generate regular expressions from the AST: 另请注意,有一个JS库可从AST生成正则表达式:
https://www.npmjs.com/package/regjsgen https://www.npmjs.com/package/regjsgen
https://github.com/bnjmnt4n/regjsgen https://github.com/bnjmnt4n/regjsgen

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM