简体   繁体   English

分割逗号分隔的正则表达式输入字符串

[英]Splitting comma separated regex input string

Given a comma separated regex string, for example: 给定逗号分隔的正则表达式字符串,例如:

^test\d+.txt$, ^[-,0-9]+$ // provided by user

I want to split it by a comma and get 我想用逗号将其分开并得到

  • ^test\\d+.txt$ ^ test \\ d + .txt $
  • ^[-,0-9]+$ ^ [-,0-9] + $

instead of 代替

  • ^test\\d+.txt$ ^ test \\ d + .txt $
  • ^[- ^ [-
  • 0-9]+$ 0-9] + $

How can I achieve this? 我该如何实现?

As you can tell from the comments , is both a valid symbol within a regex and depending on the favor a valid delimiter in it's own right (for example in a qualifier). 正如您从注释中可以看出的,它既是正则表达式中的有效符号,又取决于它本身的有效分隔符(例如,在限定符中)。 As such it's not possible to separate regex's delimited by ','. 因此,不可能用','分隔正则表达式。 I suspect it would not matter what character you used there would still be ambiguity between the symbol chosen as a delimiter and an expression that contains that character. 我怀疑您使用什么字符都没有关系,被选作定界符的符号与包含该字符的表达式之间仍然存在歧义。 In your example you comment that the input is provided by user . 在您的示例中,您评论输入provided by user If you have control over this input process you can might be able to change the delimiter to something different, for example if you know that only one language is in play you can use a unique character from a different language (assuming code is inserting the character and not the user directly). 如果您可以控制此输入过程,则可以将定界符更改为其他内容,例如,如果您知道只使用一种语言,则可以使用另一种语言的唯一字符(假设代码正在插入该字符)而不是用户直接)。 Of course the best solution is to pass this information in an array (one regex per element). 当然,最好的解决方案是将此信息传递到数组中(每个元素一个正则表达式)。

In the end it depends on a lot of things. 最后,它取决于很多事情。 Regex's are vulnerable to problems like infinite backtracking, which can be a problem when you are allowing arbitrary regex processing. 正则表达式容易受到无限回溯之类的问题的影响,当您允许任意正则表达式处理时,这可能是一个问题。 Again if you have control of the input process you might consider providing users a simpler template like language, so input like "AAAAA 99" (by user) could be converted to "[\\w]{5} [\\d]{2}" by code. 同样,如果您可以控制输入过程,则可以考虑为用户提供一个更简单的模板(例如语言),因此“ AAAAA 99”(按用户)之类的输入可以转换为“ [\\ w] {5} [\\ d] {2} ”通过代码。 This is simpler for the user and allows you more control over how much chaos user input can create. 这对于用户来说更简单,并且使您可以更好地控制用户输入可以创建多少混乱。

If you can not modify/control the input process, then you should teach your users to code regular expressions, perhaps by pointing them to the quick reference . 如果您不能修改/控制输入过程,则应该教您的用户编写正则表达式的代码,也许可以将它们指向快速参考文献 So they can create a single expression that does everything they want. 因此,他们可以创建一个可以执行所需操作的单个表达式。

Anytime you are running a unknown regex you should use the methods that allow you to provide a timeout in order to protect against regex's that longer than you want to wait to finish. 每当您运行未知的正则表达式时,都应使用允许您提供超时的方法,以防止正则表达式的等待时间超过您想要等待的时间。

Summary. 摘要。 There's no good answer. 没有好的答案。 There is ambiguity between what you chose as a delimiter and what a valid regex can contain. 在您选择作为分隔符的内容与有效正则表达式可以包含的内容之间存在歧义。 Your best bet is to change form of the input or the behavior of the user. 最好的选择是更改输入形式或用户行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM