简体   繁体   中英

Javascript RegExp Matching weirdness

I have a RegExp :

/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi

and some text "Champion"

somehow, this is coming back as a match, am I crazy?

0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2

the loop is here:

// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being 
var result;

while( result = pat.exec(someText) ) {
     // do stuff!   
}

There has to be something wrong with my RegExp, right?

EDIT: The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:

\sNCAA\s
NCAA
NCAA\s
\sNCAA

GOAL: I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.

I think that I just need to rework how I am building my RegExp.

Add start ( ^ ) and end ( $ ) anchors to the regexp.

/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi

Without the anchors, the regexp's match can start and end anywhere in the string, which is why

/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')

can match pio and i : because it's actually matching around the (case-insensitive) I . If you leave the anchors off, but remove the ...|I|... , the regex won't match 'Champion' :

> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null

Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.

Champion would match pio and i , because they both match /.?I.?/gi

It however doesn't match /.?Champions,.?/gi because of the trailing comma.

Champion matches /.?I.?/i .

Your own output notes that it's matching the substring "pio".

Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:

/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi

I know you said to ignore the .? , but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM