简体   繁体   中英

Javascript - How to add links to words in a page without interfering with the HTML in the page

I'm making a tool which searches for particular words, and when it finds them, it wraps them in a <span> tag and adds a link to them. I thought at first this would be simple, but quickly realised its not so simple because there are so many different ways it can mess up the HTML elements in the page.

NOTE: in this example, its looking for codes like this rs25283 , but this script also needs to look for predefined words which will be supplied in an array.

I started with something extremely simple:

var pattern = new RegExp("(rs[0-9]+)","ig");
output = $('body').html().replace(pattern, function replacer(contents,word) {
    return '<span>' + word + ' <a href="https://mylink.com/item/'+ word +'"></a>  </span>'; 
});

Which of course failed miserably because it replaces words inside HTML elements and attributes and creates a complete mess. Adding spaces to the pattern like this:

var pattern = new RegExp("([ ]rs[0-9]+[ ])","ig");

Will reduce the number of misreplacements, but still won't work because for example, there could be HTML like this <img src="whatever.jpg" alt="Some info about rs25162 in here.">

so the script will break that img tag.

So a more evolved approach I tried is to split the whole page into parts like this:

var words = $('body').html().split(' ');

Then loop through every part and see if it can find a match. For finding a match, I will have an associative array of the words I'm looking for, so when looping through each word on the page, I check to see if it exists in the array.

So like:

var search_words = [
  'rs14235',
  'rs6262',
  'COMT',
  'ACE'
];

for (i=0;i<words.length;i++) {
  if (search_words.indexOf(word[i]) > -1) { // do something }
}

Now the problem still exists, that it will break tags, but now what I can do is check when an attribute is opened with ", so I'll know if the word is inside a HTML tags attribute. The tags themselves is a bit trickier. For example, if this appears <h1>Title with word in it</h1> , I don't wanna replace that word. I can't filter out anything that appears in a HTML tag, because the words I need to replace are likely inside <p> , <div> , <span> and other tags.

So would the best solution here be to create a list of blacklisted HTML tags? I assume thousands of programmers have faced this exact scenario, so I don't wanna reinvent any wheels here, if anyone can show me the best approach to doing this, it'd be much appreciated.

EDIT: I found this article describing the issue: http://james.padolsey.com/javascript/replacing-text-in-the-dom-its-not-that-simple/

You can try using the .not() selector for example

$pattern = new RegExp("(rs[0-9]+)","ig");
$blacklist= $("Choose any tag class or id, p,div,a,span,nav,ul,li").not($pattern );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM