简体   繁体   English

在 JavaScript 正则表达式中命名捕获组?

[英]Named capturing groups in JavaScript regex?

As far as I know there is no such thing as named capturing groups in JavaScript.据我所知,JavaScript 中没有命名捕获组这样的东西。 What is the alternative way to get similar functionality?获得类似功能的替代方法是什么?

ECMAScript 2018 introduces named capturing groups into JavaScript regexes. ECMAScript 2018 将命名捕获组引入 JavaScript 正则表达式。

Example:例子:

  const auth = 'Bearer AUTHORIZATION_TOKEN'
  const { groups: { token } } = /Bearer (?<token>[^ $]*)/.exec(auth)
  console.log(token) // "Prints AUTHORIZATION_TOKEN"

If you need to support older browsers, you can do everything with normal (numbered) capturing groups that you can do with named capturing groups, you just need to keep track of the numbers - which may be cumbersome if the order of capturing group in your regex changes.如果您需要支持较旧的浏览器,您可以使用普通(编号)捕获组执行命名捕获组可以执行的所有操作,您只需要跟踪数字 - 如果捕获组的顺序在您的正则表达式更改。

There are only two "structural" advantages of named capturing groups I can think of:我能想到的命名捕获组只有两个“结构”优势:

  1. In some regex flavors (.NET and JGSoft, as far as I know), you can use the same name for different groups in your regex ( see here for an example where this matters ).在某些正则表达式风格(.NET 和 JGSoft,据我所知)中,您可以对正则表达式中的不同组使用相同的名称( 有关此问题的示例,请参见此处)。 But most regex flavors do not support this functionality anyway.但无论如何,大多数正则表达式都不支持此功能。

  2. If you need to refer to numbered capturing groups in a situation where they are surrounded by digits, you can get a problem.如果您需要在被数字包围的情况下引用编号的捕获组,您可能会遇到问题。 Let's say you want to add a zero to a digit and therefore want to replace (\\d) with $10 .假设您想在数字上添加一个零,因此想用$10替换(\\d) In JavaScript, this will work (as long as you have fewer than 10 capturing group in your regex), but Perl will think you're looking for backreference number 10 instead of number 1 , followed by a 0 .在 JavaScript 中,这会起作用(只要您的正则表达式中的捕获组少于 10 个),但 Perl 会认为您正在寻找反向引用编号10而不是编号1 ,然后是0 In Perl, you can use ${1}0 in this case.在 Perl 中,您可以在这种情况下使用${1}0

Other than that, named capturing groups are just "syntactic sugar".除此之外,命名的捕获组只是“语法糖”。 It helps to use capturing groups only when you really need them and to use non-capturing groups (?:...) in all other circumstances.只有在您真正需要它们时才使用捕获组,而在所有其他情况下使用非捕获组(?:...)有所帮助。

The bigger problem (in my opinion) with JavaScript is that it does not support verbose regexes which would make the creation of readable, complex regular expressions a lot easier. JavaScript 的更大问题(在我看来)是它不支持冗长的正则表达式,这会使创建可读、复杂的正则表达式变得容易得多。

Steve Levithan's XRegExp library solves these problems. Steve Levithan 的 XRegExp 库解决了这些问题。

Another possible solution: create an object containing the group names and indexes.另一种可能的解决方案:创建一个包含组名和索引的对象。

var regex = new RegExp("(.*) (.*)");
var regexGroups = { FirstName: 1, LastName: 2 };

Then, use the object keys to reference the groups:然后,使用对象键来引用组:

var m = regex.exec("John Smith");
var f = m[regexGroups.FirstName];

This improves the readability/quality of the code using the results of the regex, but not the readability of the regex itself.这使用正则表达式的结果提高了代码的可读性/质量,但不是正则表达式本身的可读性。

You can use XRegExp , an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods:您可以使用XRegExp ,这是一种增强的、可扩展的、跨浏览器的正则表达式实现,包括对附加语法、标志和方法的支持:

  • Adds new regex and replacement text syntax, including comprehensive support for named capture .添加新的正则表达式和替换文本语法,包括对命名捕获的全面支持。
  • Adds two new regex flags: s , to make dot match all characters (aka dotall or singleline mode), and x , for free-spacing and comments (aka extended mode).添加两个新的正则表达式标志: s ,使点匹配所有字符(又名 dotall 或单行模式),和x ,用于自由间距和注释(又名扩展模式)。
  • Provides a suite of functions and methods that make complex regex processing a breeze.提供一套函数和方法,使复杂的正则表达式处理变得轻而易举。
  • Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.自动修复正则表达式行为和语法中最常见的跨浏览器不一致问题。
  • Lets you easily create and use plugins that add new syntax and flags to XRegExp's regular expression language.让您轻松创建和使用插件,为 XRegExp 的正则表达式语言添加新的语法和标志。

In ES6 you can use array destructuring to catch your groups:在 ES6 中,您可以使用数组解构来捕获您的组:

let text = '27 months';
let regex = /(\d+)\s*(days?|months?|years?)/;
let [, count, unit] = regex.exec(text) || [];

// count === '27'
// unit === 'months'

Notice:注意:

  • the first comma in the last let skips the first value of the resulting array, which is the whole matched string最后一个let的第一个逗号跳过结果数组的第一个值,它是整个匹配的字符串
  • the || [] || [] || [] after .exec() will prevent a destructuring error when there are no matches (because .exec() will return null ) || [] after .exec()将在没有匹配项时防止解构错误(因为.exec()将返回null

Update: It finally made it into JavaScript (ECMAScript 2018)!更新:它终于变成了 JavaScript (ECMAScript 2018)!


Named capturing groups could make it into JavaScript very soon.命名的捕获组很快就会进入 JavaScript。
The proposal for it is at stage 3 already.它的提案已经处于第 3 阶段。

A capture group can be given a name inside angular brackets using the (?<name>...) syntax, for any identifier name.对于任何标识符名称,可以使用(?<name>...)语法在尖括号内为捕获组指定一个名称。 The regular expression for a date then can be written as /(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})/u .日期的正则表达式可以写为/(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})/u Each name should be unique and follow the grammar for ECMAScript IdentifierName .每个名称都应该是唯一的,并遵循 ECMAScript IdentifierName的语法。

Named groups can be accessed from properties of a groups property of the regular expression result.命名组可以从正则表达式结果的组属性的属性中访问。 Numbered references to the groups are also created, just as for non-named groups.与未命名的组一样,还会创建对组的编号引用。 For example:例如:

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups.year === '2015';
// result.groups.month === '01';
// result.groups.day === '02';

// result[0] === '2015-01-02';
// result[1] === '2015';
// result[2] === '01';
// result[3] === '02';

Naming captured groups provide one thing: less confusion with complex regular expressions.命名捕获的组提供了一件事:减少与复杂正则表达式的混淆。

It really depends on your use-case but maybe pretty-printing your regex could help.这真的取决于您的用例,但也许漂亮地打印您的正则表达式可能会有所帮助。

Or you could try and define constants to refer to your captured groups.或者您可以尝试定义常量来引用您捕获的组。

Comments might then also help to show others who read your code, what you have done.评论也可能有助于向阅读您代码的其他人展示您做了什么。

For the rest I must agree with Tims answer.其余的我必须同意蒂姆斯的回答。

As Tim Pietzcker said ECMAScript 2018 introduces named capturing groups into JavaScript regexes.正如Tim Pietzcker所说,ECMAScript 2018 将命名捕获组引入 JavaScript 正则表达式。 But what I did not find in the above answers was how to use the named captured group in the regex itself.但是我在上面的答案中没有找到的是如何在正则表达式本身中使用命名的捕获组

you can use named captured group with this syntax: \\k<name> .您可以使用具有以下语法的命名捕获组: \\k<name> for example例如

var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/

and as Forivin said you can use captured group in object result as follow:正如Forivin所说,您可以在对象结果中使用捕获的组,如下所示:

let result = regexObj.exec('2019-28-06 year is 2019');
// result.groups.year === '2019';
// result.groups.month === '06';
// result.groups.day === '28';

 var regexObj = /(?<year>\\d{4})-(?<day>\\d{2})-(?<month>\\d{2}) year is \\k<year>/mgi; function check(){ var inp = document.getElementById("tinput").value; let result = regexObj.exec(inp); document.getElementById("year").innerHTML = result.groups.year; document.getElementById("month").innerHTML = result.groups.month; document.getElementById("day").innerHTML = result.groups.day; }
 td, th{ border: solid 2px #ccc; }
 <input id="tinput" type="text" value="2019-28-06 year is 2019"/> <br/> <br/> <span>Pattern: "(?<year>\\d{4})-(?<day>\\d{2})-(?<month>\\d{2}) year is \\k<year>"; <br/> <br/> <button onclick="check()">Check!</button> <br/> <br/> <table> <thead> <tr> <th> <span>Year</span> </th> <th> <span>Month</span> </th> <th> <span>Day</span> </th> </tr> </thead> <tbody> <tr> <td> <span id="year"></span> </td> <td> <span id="month"></span> </td> <td> <span id="day"></span> </td> </tr> </tbody> </table>

There is a node.js library called named-regexp that you could use in your node.js projects (on in the browser by packaging the library with browserify or other packaging scripts).有一个名为named-regexp的 node.js 库,您可以在您的 node.js 项目中使用它(在浏览器中通过使用 browserify 或其他打包脚本打包库)。 However, the library cannot be used with regular expressions that contain non-named capturing groups.但是,该库不能与包含未命名捕获组的正则表达式一起使用。

If you count the opening capturing braces in your regular expression you can create a mapping between named capturing groups and the numbered capturing groups in your regex and can mix and match freely.如果您计算正则表达式中的左捕获括号,您可以在正则表达式中的命名捕获组和编号捕获组之间创建映射,并且可以自由混合和匹配。 You just have to remove the group names before using the regex.您只需要在使用正则表达式之前删除组名。 I've written three functions that demonstrate that.我已经编写了三个函数来证明这一点。 See this gist: https://gist.github.com/gbirke/2cc2370135b665eee3ef请参阅此要点: https : //gist.github.com/gbirke/2cc2370135b665eee3ef

While you can't do this with vanilla JavaScript, maybe you can use some Array.prototype function like Array.prototype.reduce to turn indexed matches into named ones using some magic .虽然你不能用普通的 JavaScript 做到这一点,但也许你可以使用一些Array.prototype函数,比如Array.prototype.reduce使用一些魔法将索引匹配转换为命名匹配。

Obviously, the following solution will need that matches occur in order:显然,以下解决方案需要按顺序进行匹配:

 // @text Contains the text to match // @regex A regular expression object (fe /.+/) // @matchNames An array of literal strings where each item // is the name of each group function namedRegexMatch(text, regex, matchNames) { var matches = regex.exec(text); return matches.reduce(function(result, match, index) { if (index > 0) // This substraction is required because we count // match indexes from 1, because 0 is the entire matched string result[matchNames[index - 1]] = match; return result; }, {}); } var myString = "Hello Alex, I am John"; var namedMatches = namedRegexMatch( myString, /Hello ([az]+), I am ([az]+)/i, ["firstPersonName", "secondPersonName"] ); alert(JSON.stringify(namedMatches));

Don't have ECMAScript 2018?没有 ECMAScript 2018?

My goal was to make it work as similar as possible to what we are used to with named groups.我的目标是让它的工作方式尽可能类似于我们习惯于命名组的方式。 Whereas in ECMAScript 2018 you can place ?<groupname> inside the group to indicate a named group, in my solution for older javascript, you can place (?!=<groupname>) inside the group to do the same thing.而在 ECMAScript 2018 中,您可以将?<groupname>放置在组内以指示命名组,而在我的旧版 javascript 解决方案中,您可以将(?!=<groupname>)放置在组内以执行相同的操作。 So it's an extra set of parenthesis and an extra != .所以它是一组额外的括号和一个额外的!= Pretty close!很接近了!

I wrapped all of it into a string prototype function我把它全部包装成一个字符串原型函数

Features特征

  • works with older javascript适用于较旧的 javascript
  • no extra code没有额外的代码
  • pretty simple to use使用起来非常简单
  • Regex still works正则表达式仍然有效
  • groups are documented within the regex itself组记录在正则表达式本身中
  • group names can have spaces组名可以有空格
  • returns object with results返回带有结果的对象

Instructions指示

  • place (?!={groupname}) inside each group you want to name(?!={groupname})放在您要命名的每个组中
  • remember to eliminate any non-capturing groups () by putting ?: at the beginning of that group.记住通过将?:放在该组的开头来消除任何非捕获组() These won't be named.这些不会被命名。

arrays.js数组.js

// @@pattern - includes injections of (?!={groupname}) for each group
// @@returns - an object with a property for each group having the group's match as the value 
String.prototype.matchWithGroups = function (pattern) {
  var matches = this.match(pattern);
  return pattern
  // get the pattern as a string
  .toString()
  // suss out the groups
  .match(/<(.+?)>/g)
  // remove the braces
  .map(function(group) {
    return group.match(/<(.+)>/)[1];
  })
  // create an object with a property for each group having the group's match as the value 
  .reduce(function(acc, curr, index, arr) {
    acc[curr] = matches[index + 1];
    return acc;
  }, {});
};    

usage用法

function testRegGroups() {
  var s = '123 Main St';
  var pattern = /((?!=<house number>)\d+)\s((?!=<street name>)\w+)\s((?!=<street type>)\w+)/;
  var o = s.matchWithGroups(pattern); // {'house number':"123", 'street name':"Main", 'street type':"St"}
  var j = JSON.stringify(o);
  var housenum = o['house number']; // 123
}

result of o o 的结果

{
  "house number": "123",
  "street name": "Main",
  "street type": "St"
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM