简体   繁体   中英

Javascript Regexp - Match string pattern except if string is inside specified tag

I am trying to replace all occurences of ???some.text.and.dots??? in a html page to add a link on it. I've built this regexp that does it :

\\?\\?\\?([a-z0-9.]*)\\?\\?\\?

However, I would like to exclude any result that is inside a link : "<a ...> ... MY PATTERN ... </a>", and I am a little stuck as to how to do that, all my attempts have failed for now.

It's not really clear what kind of "HTML" you are working on. If it is HTML code , something from an Ajax request maybe, then you can use a regular expression; matching both a link or the pattern, and then work out what to do in a callback:

var html = document.body.innerHTML;
html = html.replace(/(<a\s.*?>.*?<\/a>)|(\?\?\?([a-z0-9.]*)\?\?\?)/g, 
    function ( a, b, c, d ) {
       return ( a[0] == '<' ) ? a : '<a href="#">' + d + '</a>'; 
    });
context.innerHTML = html;

Conveniently, replace() can take a callback function as a replacement generator rather than a simple string.

If you are working on a live DOM tree, however, you might want to respect events on nodes and not simply reset the innerHTML . You'll need a bit more primitive approach for that:

// returns all childnodes of type text that do not have A as parent
function walker ( node ) {
  var nodes = [];
  for (var c, i = 0; c = node.childNodes[i]; i++) {
    if ( c.nodeType === 1 && c.tagName !== 'A' ) {
      nodes = nodes.concat( arguments.callee( c ) );
    }
    else if ( c.nodeType === 3 ) { 
      nodes.push( c );
    }
  }
  return nodes;
}

var textNodes = walker( document.body );
for (var i = 0; i < textNodes.length; i++) {
  // create an array of strings separating the pattern
  var m = textNodes[i].nodeValue.split( /(\?\?\?([a-z0-9.]*)\?\?\?)/ );
  if ( m.length > 1 ) {
    for (var j=0; j<m.length; j++) {
      var t, parent = textNodes[i].parentNode;
      // create a link for any occurence of the pattern
      if ( /^\?\?\?([a-z0-9.]*)\?\?\?$/.test( m[j] ) ) {
        var a = document.createElement( 'a' );
        a.href = "#";
        a.innerHTML = RegExp.$1;  // m[j] if you don't want to crop the ???'s
        parent.insertBefore( a, textNodes[i] );
        t = document.createTextNode( ' ' ); // whitespace padding
      }
      else {
        t = document.createTextNode( m[j] );
      }
      parent.insertBefore( t, textNodes[i] );
    }
    // remove original text node
    parent.removeChild( textNodes[i] );
  }
}

This method only touches textnodes, and then only those that match the pattern.

JavaScript doesn't inherently support look-behind. In order to do this, you'd need to run .match() and then for each of your matches, you'd need to do matches on your tags (such as /<a\\s+.*?>/ being immediately before your match and then </a> after your match).

Good luck!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM