简体   繁体   English

如何在 JavaScript 中连接正则表达式文字?

[英]How can I concatenate regex literals in JavaScript?

Is it possible to do something like this?有可能做这样的事情吗?

var pattern = /some regex segment/ + /* comment here */
    /another segment/;

Or do I have to use new RegExp() syntax and concatenate a string?或者我是否必须使用新的RegExp()语法并连接一个字符串? I'd prefer to use the literal as the code is both more self-evident and concise.我更喜欢使用文字,因为代码更不言自明和简洁。

Here is how to create a regular expression without using the regular expression literal syntax.以下是如何在不使用正则表达式文字语法的情况下创建正则表达式。 This lets you do arbitary string manipulation before it becomes a regular expression object:这使您可以在成为正则表达式对象之前进行任意字符串操作:

var segment_part = "some bit of the regexp";
var pattern = new RegExp("some regex segment" + /*comment here */
              segment_part + /* that was defined just now */
              "another segment");

If you have two regular expression literals, you can in fact concatenate them using this technique:如果您有两个正则表达式文字,实际上可以使用以下技术将它们连接起来:

var regex1 = /foo/g;
var regex2 = /bar/y;
var flags = (regex1.flags + regex2.flags).split("").sort().join("").replace(/(.)(?=.*\1)/g, "");
var regex3 = new RegExp(expression_one.source + expression_two.source, flags);
// regex3 is now /foobar/gy

It's just more wordy than just having expression one and two being literal strings instead of literal regular expressions.它只是比将表达式一和二作为文字字符串而不是文字正则表达式更冗长。

Just randomly concatenating regular expressions objects can have some adverse side effects.只是随机连接正则表达式对象可能会产生一些不利的副作用。 Use the RegExp.source instead:使用RegExp.source代替:

 var r1 = /abc/g; var r2 = /def/; var r3 = new RegExp(r1.source + r2.source, (r1.global ? 'g' : '') + (r1.ignoreCase ? 'i' : '') + (r1.multiline ? 'm' : '')); console.log(r3); var m = 'test that abcdef and abcdef has a match?'.match(r3); console.log(m); // m should contain 2 matches

This will also give you the ability to retain the regular expression flags from a previous RegExp using the standard RegExp flags.这也将使您能够使用标准 RegExp 标志保留来自先前 RegExp 的正则表达式标志。

jsFiddle js小提琴

I don't quite agree with the "eval" option.我不太同意“eval”选项。

var xxx = /abcd/;
var yyy = /efgh/;
var zzz = new RegExp(eval(xxx)+eval(yyy));

will give "//abcd//efgh//" which is not the intended result.将给出“//abcd//efgh//”,这不是预期的结果。

Using source like使用源像

var zzz = new RegExp(xxx.source+yyy.source);

will give "/abcdefgh/" and that is correct.将给出“/abcdefgh/”,这是正确的。

Logicaly there is no need to EVALUATE, you know your EXPRESSION.逻辑上没有必要评估,你知道你的表达。 You just need its SOURCE or how it is written not necessarely its value.您只需要它的 SOURCE 或它的编写方式,而不一定需要它的值。 As for the flags, you just need to use the optional argument of RegExp.至于标志,你只需要使用 RegExp 的可选参数。

In my situation, I do run in the issue of ^ and $ being used in several expression I am trying to concatenate together!在我的情况下,我确实遇到了 ^ 和 $ 在我试图连接在一起的几个表达式中使用的问题! Those expressions are grammar filters used accross the program.这些表达式是整个程序中使用的语法过滤器。 Now I wan't to use some of them together to handle the case of PREPOSITIONS.现在我不想将它们中的一些一起使用来处理介词的情况。 I may have to "slice" the sources to remove the starting and ending ^( and/or )$ :) Cheers, Alex.我可能必须“切片”源以删除开始和结束 ^( 和/或 )$ :) 干杯,亚历克斯。

Problem If the regexp contains back-matching groups like \\1.问题如果正则表达式包含像\\1 这样的反向匹配组。

var r = /(a|b)\1/  // Matches aa, bb but nothing else.
var p = /(c|d)\1/   // Matches cc, dd but nothing else.

Then just contatenating the sources will not work.那么仅仅连接源是行不通的。 Indeed, the combination of the two is:事实上,两者的结合是:

var rp = /(a|b)\1(c|d)\1/
rp.test("aadd") // Returns false

The solution: First we count the number of matching groups in the first regex, Then for each back-matching token in the second, we increment it by the number of matching groups.解决方案:首先我们计算第一个正则表达式中匹配组的数量,然后对于第二个中的每个反向匹配标记,我们将其增加匹配组的数量。

function concatenate(r1, r2) {
  var count = function(r, str) {
    return str.match(r).length;
  }
  var numberGroups = /([^\\]|^)(?=\((?!\?:))/g; // Home-made regexp to count groups.
  var offset = count(numberGroups, r1.source);    
  var escapedMatch = /[\\](?:(\d+)|.)/g;        // Home-made regexp for escaped literals, greedy on numbers.
  var r2newSource = r2.source.replace(escapedMatch, function(match, number) { return number?"\\"+(number-0+offset):match; });
  return new RegExp(r1.source+r2newSource,
      (r1.global ? 'g' : '') 
      + (r1.ignoreCase ? 'i' : '')
      + (r1.multiline ? 'm' : ''));
}

Test:测试:

var rp = concatenate(r, p) // returns  /(a|b)\1(c|d)\2/
rp.test("aadd") // Returns true

Providing that:提供了:

  • you know what you do in your regexp;你知道你在正则表达式中做了什么;
  • you have many regex pieces to form a pattern and they will use same flag;你有许多正则表达式组成一个模式,它们将使用相同的标志;
  • you find it more readable to separate your small pattern chunks into an array;你发现将你的小模式块分成一个数组更容易阅读;
  • you also want to be able to comment each part for next dev or yourself later;您还希望能够为下一个开发人员或您自己稍后评论每个部分;
  • you prefer to visually simplify your regex like /this/g rather than new RegExp('this', 'g') ;你更喜欢在视觉上简化你的正则表达式,比如/this/g而不是new RegExp('this', 'g') ;
  • it's ok for you to assemble the regex in an extra step rather than having it in one piece from the start;您可以在一个额外的步骤中组装正则表达式,而不是从一开始就将其放在一个整体中;

Then you may like to write this way:那么你可能喜欢这样写:

var regexParts =
    [
        /\b(\d+|null)\b/,// Some comments.
        /\b(true|false)\b/,
        /\b(new|getElementsBy(?:Tag|Class|)Name|arguments|getElementById|if|else|do|null|return|case|default|function|typeof|undefined|instanceof|this|document|window|while|for|switch|in|break|continue|length|var|(?:clear|set)(?:Timeout|Interval))(?=\W)/,
        /(\$|jQuery)/,
        /many more patterns/
    ],
    regexString  = regexParts.map(function(x){return x.source}).join('|'),
    regexPattern = new RegExp(regexString, 'g');

you can then do something like:然后,您可以执行以下操作:

string.replace(regexPattern, function()
{
    var m = arguments,
        Class = '';

    switch(true)
    {
        // Numbers and 'null'.
        case (Boolean)(m[1]):
            m = m[1];
            Class = 'number';
            break;

        // True or False.
        case (Boolean)(m[2]):
            m = m[2];
            Class = 'bool';
            break;

        // True or False.
        case (Boolean)(m[3]):
            m = m[3];
            Class = 'keyword';
            break;

        // $ or 'jQuery'.
        case (Boolean)(m[4]):
            m = m[4];
            Class = 'dollar';
            break;

        // More cases...
    }

    return '<span class="' + Class + '">' + m + '</span>';
})

In my particular case (a code-mirror-like editor), it is much easier to perform one big regex, rather than a lot of replaces like following as each time I replace with a html tag to wrap an expression, the next pattern will be harder to target without affecting the html tag itself (and without the good lookbehind that is unfortunately not supported in javascript):在我的特殊情况下(类似代码镜像的编辑器),执行一个大的正则表达式要容易得多,而不是像下面这样的大量替换,因为每次我用 html 标签替换来包装一个表达式时,下一个模式将在不影响 html 标签本身的情况下更难定位(并且没有 javascript 不支持的良好的lookbehind ):

.replace(/(\b\d+|null\b)/g, '<span class="number">$1</span>')
.replace(/(\btrue|false\b)/g, '<span class="bool">$1</span>')
.replace(/\b(new|getElementsBy(?:Tag|Class|)Name|arguments|getElementById|if|else|do|null|return|case|default|function|typeof|undefined|instanceof|this|document|window|while|for|switch|in|break|continue|var|(?:clear|set)(?:Timeout|Interval))(?=\W)/g, '<span class="keyword">$1</span>')
.replace(/\$/g, '<span class="dollar">$</span>')
.replace(/([\[\](){}.:;,+\-?=])/g, '<span class="ponctuation">$1</span>')

It would be preferable to use the literal syntax as often as possible.最好尽可能多地使用文字语法。 It's shorter, more legible, and you do not need escape quotes or double-escape backlashes.它更短、更清晰,而且您不需要转义引号或双转义反冲。 From "Javascript Patterns", Stoyan Stefanov 2010.来自“Javascript 模式”,Stoyan Stefanov 2010。

But using New may be the only way to concatenate.但是使用 New 可能是连接的唯一方法。

I would avoid eval.我会避免评估。 Its not safe.它不安全。

您必须使用新的RegExp!-)

You could do something like:你可以这样做:

function concatRegex(...segments) {
  return new RegExp(segments.join(''));
}

The segments would be strings (rather than regex literals) passed in as separate arguments.这些段将是作为单独参数传入的字符串(而不是正则表达式)。

You can concat regex source from both the literal and RegExp class:您可以从文字和 RegExp 类中连接正则表达式源:

var xxx = new RegExp(/abcd/);
var zzz = new RegExp(xxx.source + /efgh/.source);

No, the literal way is not supported.不,不支持字面方式。 You'll have to use RegExp.您将不得不使用 RegExp。

Use the constructor with 2 params and avoid the problem with trailing '/':使用带有 2 个参数的构造函数并避免尾随 '/' 的问题:

var re_final = new RegExp("\\" + ".", "g");    // constructor can have 2 params!
console.log("...finally".replace(re_final, "!") + "\n" + re_final + 
    " works as expected...");                  // !!!finally works as expected

                         // meanwhile

re_final = new RegExp("\\" + "." + "g");              // appends final '/'
console.log("... finally".replace(re_final, "!"));    // ...finally
console.log(re_final, "does not work!");              // does not work

the easier way to me would be concatenate the sources, ex.:对我来说更简单的方法是连接来源,例如:

a = /\d+/
b = /\w+/
c = new RegExp(a.source + b.source)

the c value will result in: c 值将导致:

/\\d+\\w+/ /\\d+\\w+/

我更喜欢使用eval('your expression')因为它不会添加/在每个月底/='new RegExp'呢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM