简体   繁体   English

如何使我的正则表达式匹配空白而不消耗它们?

[英]How do I make my regex match whitespaces without consuming them?

I'm trying to match lines that contains chords, but I need to make sure each match is surrounded by whitespace or first in line without consuming the characters as I don't want them returned to the caller. 我正在尝试匹配包含和弦的行,但是我需要确保每个匹配项都被空格包围或排在第一位,而不消耗字符,因为我不希望它们返回给调用者。

Eg 例如

Standard Tuning (Capo on fifth fret)

Time signature: 12/8
Tempo: 1.5 * Quarter note = 68 BPM

Intro: G Em7 G Em7

  G                 Em7
I heard there was a secret chord
     G                   Em7
That David played and it pleased the lord
    C                D              G/B     D
But you don't really care for music, do you? 
        G/B                C          D
Well it goes like this the fourth, the fifth
    Em7                 C
The minor fall and the major lift
    D            B7/D#         Em
The baffled king composing hallelujah

Chorus:

G/A   G/B  C           Em         C             G/B   D/A    G
Hal - le-  lujah, hallelujah, hallelujah, hallelu-u-u-u-jah .... 

Almost works except it also matches the "B" in "68 BPM". 除了它还与“ 68 BPM”中的“ B”匹配外,几乎所有的作品。 Now how do I make sure that chords are correctly matched? 现在如何确保和弦正确匹配? I don't want it to match the B in Before or the D or E in SUBSIDE? 我不希望它与“之前”中的B或“ SUBSIDE”中的D或E相匹配?

This is my algorithm for matching on each separate line: 这是我在每行上进行匹配的算法:

function getChordMatches(line) {
    var pattern = /[ABCDEFG](?:#|##|b|bb)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[ABCDEFG](?:#|##|b|bb)?)?/g;
    var chords = line.match(pattern);
    var positions = [];
    while ((match = pattern.exec(line)) != null) {
        positions.push(match.index);
    }

    return {
        "chords":chords,
        "positions":positions
    };
}

That is I want arrays on the form ["A", "Bm", "C#"] and not [" A", "Bm ", " C# "]. 那就是我想要数组形式为[“ A”,“ Bm”,“ C#”]而不是[“ A”,“ Bm”,“ C#”]。

edit 编辑

I made it work using the accepted answer. 我使用接受的答案使其工作。 I had to make some adjustments to accomodate the leading whitespaces. 我必须进行一些调整以适应主要的空白。 Thanks for taking the time everyone! 感谢大家抽出宝贵的时间!

function getChordMatches(line) {
    var pattern = /(?:^|\s)[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?(?!\S)/g;
    var chords = line.match(pattern);
    var chordLength = -1;
    var positions = [];

    while ((match = pattern.exec(line)) != null) {
        positions.push(match.index);
    }

    for (var i = 0; chords && i < chords.length; i++) {
        chordLength = chords[i].length;
        chords[i] = chords[i].trim();
        positions[i] -= chords[i].length - chordLength;
    }

    return {
        "chords":chords,
        "positions":positions
    };
}

I assume that you have split the input into lines already. 我假设您已经将输入分成几行。 And the function will process the lines one by one. 该函数将一一处理行。

You just need to check that the line has a chord as the first item before extracting them: 您只需在提取前检查一下该行是否有和弦:

if (/^\s*[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?(?!\S)/.test(line)) {
    // Match the chords here
}

I added ^\\s* in front to check from the beginning of the line, and added (?!\\S) to check that there is a whitespace character \\s or end of line after the first chord. 我在前面添加了^\\s* ,以从行首开始进行检查,并添加了(?!\\S)以检查在第一个和弦之后是否存在空格字符\\s或行尾。

Note that I made some minor changes to your regex, since A## (assuming it is valid chord) will not be matched by your current regex. 请注意,我对您的正则表达式做了一些细微的更改,因为A## (假设它是有效的和弦)不会与您当前的正则表达式匹配。 The regex engine will check the match by following the order of the patterns in alternation, so # will be attempted first in #|## . 正则表达式引擎将按照交替模式的顺序检查匹配,所以#将首先尝试#|## It will find that A# matches and return the match without checking for ## . 它将发现A#匹配并返回匹配而不检查## Either reversing the order ##|# or use greedy quantifier ##? 颠倒顺序##|#还是使用贪婪量词##? fixes the problem, as it checks for the longer alternative first. 解决了该问题,因为它首先检查了更长的选择。


If you are sure that: "if the first item is a chord, then the rest are chords", then instead of matching, you can just split by spaces: 如果您确定:“如果第一个项目是和弦,则其余部分为和弦”,则可以用空格分开而不是匹配:

line.split(/\s+/);

Update 更新资料

If you want to just match your pattern, regardless of whether the chord is inside a sentence (what you currently have will do that): 如果您只想匹配您的模式,无论和弦是否在句子中(您当前拥有的内容都可以):

/(?:^|\s)[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?(?!\S)/

This regex is to be placed in the code you have in your question. 此正则表达式将放置在您的问题代码中。

I check that the chord is preceded by whitespace character or is the beginning of the line with (?:^|\\s) . 我检查和弦是否以空格字符开头,或者是(?:^|\\s)行的开头。 You need to trim the leading space in the result, though. 不过,您需要修剪结果中的前导空间。

Using \\b instead of (?:^|\\s) will avoid leading space issue, but the meaning is different. 使用\\b代替(?:^|\\s)可以避免出现空格问题,但是含义有所不同。 Unless you know the input well enough, I'd advice against it. 除非您足够了解输入,否则我建议您不要这样做。


Another way is to split the string by \\s+ , and test the following regex against each of the token (note the ^ at the beginning and $ at the end): 另一种方法是用\\s+分割字符串,并针对每个令牌测试以下正则表达式(请注意,开头的^和结尾的$ ):

 /^[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?$/

Adding \\b (word boundary) to the start and end works for me. 在开头和结尾添加\\b (单词边界)对我来说很有效。 Also, you can use AG instead of ABCDEFG . 另外,您可以使用AG代替ABCDEFG Thus: 从而:

> re = /\b[A-G](?:#|##|b|bb)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:#|##|b|bb)?)?\b/g
/\b[A-G](?:#|##|b|bb)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:#|##|b|bb)?)?\b/g

> 'G/A   G/B  C           Em         C             G/B   D/A    G'.match(re)
["G/A", "G/B", "C", "Em", "C", "G/B", "D/A", "G"]

> 'Tempo: 1.5 * Quarter note = 68 BPM'.match(re)
null

In answer to the specific question in the title, use the look ahead : 要回答标题中的特定问题,请使用以下内容:

 (?=\s)

when embedded in an RE would ensure that the following character was a whitespace without consuming it. 如果将其嵌入RE中,则会确保后面的字符是空白而不会占用它。

Try the following 尝试以下

function getChordMatches( line ) {
    var match,
        pattern = /(?:^|\s)([A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?\d*(?:\/[A-G](?:##?|bb?)?)?)(?=$|\s)/g,
        chords = [],
        positions = [];

    while ( match = pattern.exec(line) ) {
        chords.push( match[1] );
        positions.push( match.index );
    }

    return {
        "chords" : chords,
        "positions" : positions
    };
}

It uses (?:^|\\s) to make sure the chord is either at the start of the line or is preceded by a space, and uses the positive look-ahead (?=$|\\s) to make sure the chord is followed by a space or is at the end of the line. 它使用(?:^|\\s)确保和弦在行的开头或前面有一个空格,并使用正向预见(?=$|\\s)确保和弦后面跟一个空格或在行尾。 Parentheses are added to capture the chord itself, which is then accessed by match[1] . 添加括号以捕获和弦本身,然后可通过match[1]访问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM