简体   繁体   English

这个正则表达式是解析我的字符串的最有效方法吗?

[英]Is this regex the most efficient way of parsing my string?

First off, here are the parameters to follow in the string I allow the user to input: 首先,这是我允许用户输入的字符串中要遵循的参数:

  • If there is a slash, it has to appear at the start of the string, nowhere else, is limited to 1, is optional and must be succeeded by [a-zA-Z]. 如果存在斜杠,则必须在字符串的开头出现斜杠,在其他任何地方都不能限制为1,这是可选的,必须以[a-zA-Z]开头。
  • If there is a tilde, it has to appear after a space " ", nothing else, is optional and must be succeeded by [a-zA-Z]. 如果有代字号,则必须在空格“”之后出现,没有其他内容,它是可选的,必须由[a-zA-Z]代替。 Also, this expression is limited to 2. (ie: ~exa ~mple is passed but ~exa ~mp ~le is not passed) 同样,此表达式限制为2。(即:〜exa〜mple通过,但〜exa〜mp〜le未通过)
  • The slash followed by a word is an instruction, like /get or /post. 斜线后跟一个单词是一条指令,例如/ get或/ post。
  • The tilde followed by a word is a parameter like ~now or ~later. 波浪号后跟一个单词是〜now或〜later之类的参数。

String format: 字串格式:

  • [instruction] (optional) [query] [extra parameters] (optional) [指令](可选)[查询] [额外参数](可选)
  • [instruction] - Must contain / succeeded with [a-zA-Z] only [说明]-仅可包含[a-zA-Z]
  • [query] - Can contain [\\w\\s()'-] (alphanumeric, whitespace, parentheses, apostrophe, dash) [查询]-可以包含[\\ w \\ s()'-](字母数字,空格,括号,撇号,破折号)
  • [extra parameters] - ~ preceded by whitespace, succeeded with only [a-zA-Z] [额外参数]-〜以空格开头,仅以[a-zA-Z]开头

String examples that should work: 应该起作用的字符串示例:

/get D0cUm3nt   ex4Mpl3'  ~now
D0cUm3nt  ex4Mpl3'
/post T(h)(i5  s(h)ou__ld w0rk t0-0'

String examples that shouldn't work: 不起作用的字符串示例:

//get document~now
~later
example ~now~later

Before I pass the string through the regex I trim any whitespace at the start and end of the string (before any text is seen) but I don't trim double whitespaces within the string as some queries require them. 在将字符串通过正则表达式传递之前,我会修剪字符串开头和结尾的所有空格(在看不到任何文本之前),但是由于某些查询需要它们,所以我不会修剪字符串中的双空格。

Here is the regex I used: 这是我使用的正则表达式:

^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$

To break it down slightly: 对其进行细分:

[instruction check] - (/{0,1}[a-zA-Z])?
[query check] - [\w\s()'-]*
[parameter check] - ((\s~[a-zA-Z]*){0,2})?

This is the first time I've actually done any serious regex away from a tutorial so I'm wondering is there anything I can change within my regex to make it more compact/efficient? 这是我第一次真正地从教程中完成任何重要的正则表达式,所以我想知道我可以在正则表达式中进行任何更改以使其更加紧凑/高效吗?

All fresh perspectives are appreciated! 感谢所有新观点!

Thanks. 谢谢。

From your regex: ^(/{0,1}[a-zA-Z])?[\\w\\s()'-]*((\\s~[a-zA-Z]*){0,2})?$ , 从您的正则表达式中: ^(/{0,1}[a-zA-Z])?[\\w\\s()'-]*((\\s~[a-zA-Z]*){0,2})?$

you can change {0,1} to ? 您可以将{0,1}更改为? that is a shortcut to say 0 or 1 times: 这是说0或1次的捷径:

^(/?[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$

The last part is present 0,1 or 2 times, then the ? 最后一部分出现0,1或2次,则? is superfluous: 是多余的:

^(/?[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$

The first part may be simplified too, the ? 第一部分也可以简化? just after the / is superfluous: /多余之后:

^(/[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$

If you don't use the captured groups, you can change them to non-capture group: (?: ) that are more efficient 如果您不使用捕获的组,则可以将它们更改为非捕获组:( (?: ) ,这样效率更高

^(?:/[a-zA-Z])?[\w\s()'-]*(?:\s~[a-zA-Z]*){0,2}$

You can also use the case-insensitive modifier (?i) : 您还可以使用不区分大小写的修饰符(?i)

^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]*){0,2}$

Finally, as said in OP, ~ must be followed by [a-zA-Z] , so change the last * by + : 最后,如OP中所述, ~必须紧跟[a-zA-Z] ,因此将最后一个*改为+

^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]+){0,2}$

This looks slightly better: 这看起来稍微好一点:

^(?:/?[a-zA-Z]*\s)?[\w\s()'-]*(?:\s~[a-zA-Z]*)*$

https://codereview.stackexchange.com/ is more the place for this kind of thing https://codereview.stackexchange.com/更适合这类事情

Assuming that capture groups are useful to you: 假设捕获组对您有用:

^((?:\\/|\\s~)[az]+)?([\\w\\s()'-]+)(~[az]+)?$

Regex101 Demo Regex101演示

也许这就是您想要的:

var regex = /^((\/)?[a-zA-Z]+)?[\w\s()'-]*((\s~)?[a-zA-Z]+){0,2}$/;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM