简体   繁体   中英

Javascript - Use variable RegExp to match multiple keywords in an array of data

I'm using AngularJS in here. I have no problem matching those words except "C++". Every time I type in "c++" as keyword to generate the RegExp in Javascript and run the matching, I get the error in console as below:

SyntaxError: Invalid regular expression: /(\\bc++\\b)/: Nothing to repeat

The code snippet is as below:

 $scope.data = [ {'title': 'Blue Java Programming Book'}, {'title': 'Red C++ Programming Book'}, {'title': 'Javascript Dummies Guide'} ]; $scope.submit = function() { $scope.length = $scope.keywords.split(" ").length; $scope.keywordsArray = $scope.keywords.split(" "); $scope.pattern = ""; for (var y = 0; y < $scope.length; y++) { $scope.pattern += "(?=.*?\\\\b" + $scope.keywordsArray[y] + "\\\\b)"; } $scope.pattern+=".*"; $scope.patt = new RegExp($scope.pattern, "i"); for (var x = 0; x < $scope.data.length; x++) { console.log("Match [" + x + "] " + $scope.patt.test($scope.data[x].description)); } } 
 <input type="text" ng-model="keywords"></input> <button ng-click="submit()">Submit</button> 

I understand that the + sign in RegExp is for matching one or more times of the preceding character, then I tried hardcode the RegExp as below to test and it matches, but not the way I wanted as I need the RegExp to be generated every time I key in the keywords.

 $scope.regExp = /c\\+\\++/i 

Is there any way to generate a RegExp on the fly with multiple keywords to match an array of data that includes "c++"?

Considering that you'll collect input in var ip , you can try this:

rrexp = new RegExp('[\\+|\\^|\\-|\\||\\?|\\*|\\{|\\}|\\$]','g');
//rrexp contains all the special characters which need to be escaped

ip = 'c++';
var escapedExp = ip.replace(rrexp, function(fs, matched){
  return '\\'+fs;
});
/*
ip.replace will replace special characters in the 'ip' to be replaced by escaped version of them.
For Eg. + will replaced \\+. Thus 'c++' becomes 'c\\+\\+'
*/ 

var regEx = new RegExp(escapedExp, 'gi');
// this creates Regular Expression based on the ip which matches all exp and is case insensitive.

q = 'Red C++ Programming Book';
q.match(regEx);  //this should output: [ 'C++' ]

Edit

If you want to create multiple Regex, you can put ip.replace and new Regex in a loop. Sometime like

inputs = ['c++', 'simpleExp', 'complex$one'];
var escapedExp, regEx;
regexList = [];
inputs.forEach(function(ip) {
  escapedExp = ip.replace(rrexp, function(fs, matched){
    return '\\'+fs;
  });
  regEx = new RegExp(escapedExp, 'gi');
  regexList.push(regEx);
});
//regexList will contain all the Regex based on inputs

Edit 2: \\b word boundary cannot match words with special characters.

A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. Thus all the special characters except '_' will not recognised by \\b .

I can suggest a hack: you need to figure out where in the keywords special characters can appear and then add \\b according to it. If there are special character in the end of the keyword we cannot add \\b after it similarly for start of the keyword. If both ends have normal characters then we can add \\b to both ends.

Here's how I would do:

noBAtStart = false;
noBAtEnd = false;
var escapedExp = ip.replace(rrexp, function(matched, offset) {
  if(offset == 0)
    noBAtStart = true;
  if(offset == ip.length - 1)
    noBAtEnd = true;
  return '\\' + matched;
});

if(!noBAtStart)
  escapedExp = '\\b' + escapedExp;
if(!noBAtEnd)
  escapedExp = escapedExp + '\\b';

var regEx = new RegExp(escapedExp, 'gi');

You have to escape the special characters

for (var y = 0; y < $scope.length; y++) {
  var specialRegexChars = ["*", "+", ".", "(", ")", "{", "}"];

  // For each character in the word, prepend it with \ if it's in our list of special characters
  var chars = $scope.keywordsArray[y].split("");
  for (var i = 0; i < chars.length; i++) {
    if (specialRegexChars.indexOf(chars[i]) !== -1) {
      chars[i] = "\\" + chars[i];
    }
  }
  $scope.pattern += "(?=.*?\\b" + chars.join("") + "\\b)";
}

Something like that. Note that this solution is pretty verbose, and that list of special chars is very limited.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM