Find all <a href> HTML tags and append target blank values using Python regular expression

Question

<a href='https://example.com/'>

references in a large file and append the

target='_blank' rel='noopener noreferrer'

option to the end of the tag, if it is missing.

Roughly, I did the following:

re.sub(r'<a href=([^>]+)', r'<a href=([^>]+)' + " target='_blank' rel='noopener noreferrer'", content)

Note: content contains the body of text to alter.

But, the second argument, which should be the value to replace is messing up the result.

The output I am getting is:

<a href=([^>]+) target='_blank' rel='noopener noreferrer'>

The expected result should be:

<a href='https://example.com/' target='_blank' rel='noopener noreferrer'>

What am I doing incorrectly, and how do I fix this issue?

Answer 1

Try this: (*** If coding professionally, use the tool ti7 suggested.)

import re
content = "<a href='https://example.com/'>"
x = re.sub(r'(<a href=([^>]+))', r'\1' + " target='_blank' rel='noopener noreferrer'", content)
print(x)

output:
   <a href='https://example.com/' target='_blank' rel='noopener noreferrer'>

Answer 2

If you can use a 3rd-party library, BeautifulSoup may work very well for you!
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_contents, "html.parser")
soup.find_all("a")

Find all <a href> HTML tags and append target blank values using Python regular expression

Question

2 answers

solution1
1 2022-11-21 05:16:35

solution2
0 2022-11-21 04:59:57

Find all <a href> HTML tags and append target blank values using Python regular expression

Question

2 answers

solution1 1 2022-11-21 05:16:35

solution2 0 2022-11-21 04:59:57

solution1
1 2022-11-21 05:16:35

solution2
0 2022-11-21 04:59:57