简体   繁体   中英

Js - regex for finding urls in body text not working

I am trying to implement regex that I found here . I would like to find any http , https or web a tags and then just add target="blank" to them. So, the code looks like this:

const urlRegex = /(((https?:\/\/)|(www\.))[^\s]+)/g;

        return this.node.body.replace(urlRegex, function(url) {
            return `${url}" target="blank">`;
        })

And if I get a text like this:

<p>
  <a href='www.norden.org'>Nordens</a>
</p>
<p>
  <figure>
    <img src='http://tornado-node.net/wp-content/uploads/2017/08/Ove-Hansen.jpg' alt=' Styreleder Ove Hansen. Foto: Arne Walderhaug' />   

    <figcaption>Ove Hansen, styreleder i Norden</figcaption>
  </figure>
</p>
<p>
  <a href='http://norden.org/documents.html'>norden.org</a>
</p>

This is the result from the above function:

<p>
  <a href='<a href=\"www.norden.org'>Nordens</a>
</p>
<p>
   <figure>
     <img\" target=\"blank\"> src='<a href=\"http://tornado-node.net/wp-content/uploads/2017/08/Ove-Hansen.jpg'\" target=\"blank\"> alt=' Styreleder Leif-Ove Hansen. Foto: Arne Walderhaug' />
     <figcaption>Ove Hansen, styreleder i Norden</figcaption>
    </figure>
</p>
<p>
   <a href='<a href=\"http://norden.org/documents.html'>norden.org</a></p>\" target=\"blank\">"

What is the correct way to implement this?

Update

I am also trying with finding the href in the text like this:

    let str   = this.node.body;
    const regex = /(href=\')([^\']*)(\')/g;

    if (str.match(regex)) {
      for(let i = 0; i < str.match(regex).length; i++) {
        let url = str.match(regex)[i] + ' target="_blank"';
      }
    }

And that gives me an array with strings that match href but and I add target="_blank" to it, but how can I replace that now with inside the text that I am checking?

When dealing with HTML, try to avoid parsing string. You can try something like this:

Logic:

  • Create a dummy element to work on. This will be an in-memory element and will not be rendered.
  • Set html string as its innerHTML.
  • Fetch any element that can have url in it like a or img .
  • Loop on this list and check for regex validity on necessary attribute.
  • If satisfied, add attribute.

 function getUpdatedHTMLString(htmlString){ var urlRegex = /(((https?:\\/\\/)|(www\\.))[^\\s]+)/g; var dummy = document.createElement('div'); dummy.innerHTML = htmlString; var list = dummy.querySelectorAll('a, img'); for(var i = 0; i< list.length; i++) { var href = list[i].getAttribute('href'); var src = list[i].getAttribute('src'); if (urlRegex.test(src) || urlRegex.test(href)) { list[i].setAttribute('target', '_blank'); } } return dummy.innerHTML; } var str = "<p>" + "<a href='www.norden.org'>Nordens</a>" + "</p>" + "<p>" + "<figure>" + "<img src='http://tornado-node.net/wp-content/uploads/2017/08/Ove-Hansen.jpg' alt=' Styreleder Ove Hansen. Foto: Arne Walderhaug' />" + "<figcaption>Ove Hansen, styreleder i Norden</figcaption>" + "</figure>" + "</p>" + "<p>" + "<a href='http://norden.org/documents.html'>norden.org</a>" + "</p>"; console.log(getUpdatedHTMLString(str)); 
 <p> <a href='www.norden.org'>Nordens</a> </p> <p> <figure> <img src='http://tornado-node.net/wp-content/uploads/2017/08/Ove-Hansen.jpg' alt=' Styreleder Ove Hansen. Foto: Arne Walderhaug' /> <figcaption>Ove Hansen, styreleder i Norden</figcaption> </figure> </p> <p> <a href='http://norden.org/documents.html'>norden.org</a> </p> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM