简体   繁体   English

Java正则表达式组

[英]Regex-Groups in Javascript

I have a problem using a Javascript-Regexp. 我在使用Javascript-Regexp时遇到问题。

This is a very simplified regexp, which demonstrates my Problem: 这是一个非常简化的正则表达式,它说明了我的问题:

(?:\s(\+\d\w*))|(\w+)

This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character). 此正则表达式只能匹配不包含禁止字符的字符串(所有不包含单词字符的字符串)。

The only exception is the Symbol + 唯一的例外是符号+
A match is allowed to start with this symbol, if [0-9] is trailing. 如果[0-9]在结尾,则允许以该符号开头的匹配。 And a + must not appear within words ( 44+44 is not a valid match, but +4ad is) +不得字(内出现44+44不是有效的匹配,但+4ad是)

In order to allow the + only at the beginning, I said that there must be a whitespace preceding. 为了仅在开头允许+,我说过前面必须有一个空格。 However, I don't want the whitespace to be part of the match. 但是,我不希望空格成为比赛的一部分。

I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine. 我使用以下工具测试了正则表达式: http: //regex101.com/#javascript,结果匹配看起来不错。

There are 2 Issues with that regexp: 该正则表达式有两个问题:

  • If I use it in my JS-Code, the space is always part of the match 如果我在JS代码中使用它,则空格始终是匹配项的一部分
  • If +42 appears at the beginning of a line, it won't be matched 如果+42出现在行首,则不会匹配

My Questions: 我的问题:

  • How should the regex look like? 正则表达式应如何显示?
  • Why does this regex add the space to the matches? 为什么此正则表达式将空格添加到匹配项中?

Here's my JS-Code: 这是我的JS代码:

var input =  "+5ad6  +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);

How should the regex look like? 正则表达式应如何显示?

You've got multiple choices to reach your goal: 您有多种选择可以实现自己的目标:

  • It's fine as you have it. 一切都好。 You might allow the string beginning in place of the whitespace as well, though. 不过,您也可以允许字符串以空格开头 Just get the capturing groups ( tokens[1] , tokens[2] ) out of it, which will not include the whitespace. 只需从中获取捕获组( tokens[1]tokens[2] )即可,其中不包括空格。
  • If you didn't use JavaScript, a lookbehind could help. 如果您不使用JavaScript,那么回头看可能会有所帮助。 Unfortunately it's not supported. 不幸的是,它不受支持。
  • Require a non- word-boundary before the + , which would make every \\w character before the + prevent the match: 要求非字边界的前+ ,这将使每个\\w前字符+防止比赛:

     /\\B\\+\\d\\w+|\\w+/ 

Why does this regex add the space to the matches? 为什么此正则表达式将空格添加到匹配项中?

Because the regex does match the whitespace. 因为正则表达式确实匹配空白。 It does not add the \\s(\\+\\d\\w+) to the captured groups , though. 但是,它不会将\\s(\\+\\d\\w+)捕获的组中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM