简体   繁体   中英

Get text from <a> tags in text using javascript

I'm getting html content from API.

Sample message could look like below

Lorem ipsum dolor sit amet <a href="https://example.com">example.com</a>
Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor <a href="https://google.com">google.com</a>

I need my message to look line below, plain text with

Lorem ipsum dolor sit amet example.com
Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor google.com

I've tried to use regex with groups, js code below

const r = /^<a href.*>(.*?)<\/a>$/gm

let link = `<a href="https://google.com" target="_blank">google.com</a> test <a href="test.com">test.com</a>`

let result

while((result = r.exec(link)) !== null) {
  const match = result[1];
  link = link.replace(r, match)
}

console.log(link)

I also tried simple code like below

const r = /^<a href.*>(.*?)<\/a>$/gm

let link = `<a href="https://google.com" target="_blank">google.com</a> test <a href="test.com">test.com</a>`

link = link.replaceAll(r, "$1")

console.log(link)

Unfortunately, in both cases after running my code console.log prints "test.com", not whole message.

Are there any better solutions?

You do not need to do it with a regular expression. You can use DOM to remove the links and any other HTML tags.

 const htmlString = `Lorem ipsum dolor sit amet <a href="https://example.com">example.com</a> Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor. Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor <a href="https://google.com">google.com</a>` const parser = new DOMParser(); const doc = parser.parseFromString(htmlString, "text/html"); const text = doc.body.textContent; console.log(text);

If you just want to remove links and leave other HTML tags that is also possible.

 const htmlString = `Lorem ipsum dolor sit amet <a href="https://example.com">example.com</a> Pellentesque <b>porta</b> ligula <em>et justo</em> condimentum, nec tincidunt libero tempor. Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor <a href="https://google.com">google.com</a>` const parser = new DOMParser(); const doc = parser.parseFromString(htmlString, "text/html"); const anchors = doc.body.querySelectorAll("a"); anchors.forEach(node => node.replaceWith(...node.childNodes)); const htmlWithAnchorsRemoved = doc.body.innerHTML; console.log(htmlWithAnchorsRemoved);

The pattern for removing all anchor tags from an text would be something like this:

<a.*?</a>

with the global tag.

It will specifically search for all the anchor tags in your string and will match it globally (ie all over the text which you are using). You can use this regex with replaceAll function like this:

let value = string.replaceAll("<a[^>]*>(.*?)</a>", "");

You can test the regex here

I have tested the given string and the output is as follows:

regex-to-remove-anchor-tags-from-a-given-string

Hope this helps. Let me know if you have any queries.

Regards

Using regexp to parse html is never a good path to follow. Maybe the following will help you?

 const html=`Lorem ipsum dolor sit amet <a href="https://example.com">example.com</a> Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor. Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor <a href="https://google.com">google.com</a>`; function html2text(html){ const o=document.createElement("div"); o.innerHTML=html; return o.textContent; } console.log(html2text(html));

Thx for all answers. Solution from @bobble-bubble comment works for me

Code snippet below

 const replaceHTML = (text) => { const rLink = /<\/?a\b[^><]*>/gi text = text.replace(rLink, "") return text } console.log(replaceHTML(`<a href="google.com" target="_blank">google.com</a>`))

temp = document.createElement('template');
temp.innerHTML = text;
temp.content.querySelectorAll('a').forEach(e=>{e.replaceWith(e.href)});
console.log(temp.innerHTML);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM