简体   繁体   中英

BeautifulSoup: How to replace value in an element with an element tag?

Say that I have this piece of HTML:

<p>This text is my <a href="#">text</a><p>

How do I replace the first "text" with an anchor element, so the result becomes:

<p>This <a href="#">text</a> is my <a href="#">text</a><p>

I basically want to replace a substring in a NavigableString with a Tag.

Your question has two parts:

  1. Turning the single NavigableString "This text is my" into a NavigableString, a Tag, and another NavigableString.

  2. Replacing the NavigableString "This text is my" with the three new elements.

The answer to #1 depends on your situation. Specifically it depends on how you determine what part of the text needs linking. I'll use a regular expression to find the string "text":

from bs4 import BeautifulSoup
data = '<p>This text is my <a href="#">text</a><p>'

soup = BeautifulSoup(data)
original_string = soup.p.contents[0]

print(original_string)
# "This text is my "

import re
this, text, is_my = re.compile("(text)").split(original_string)

Now for #2. This is not as easy as it could be, but it's definitely possible. First, Turn text into a Tag containing the link text:

text_link = soup.new_tag("a", href="#")
text_link.string = text

re.split() turned this and is_my into ordinary Unicode strings. Turn them back into NavigableString s so they can go back into the tree as elements:

this = soup.new_string(this)
is_my = soup.new_string(is_my)

Now use replace_with() and insert_after to replace the old element with the three new elements:

original_string.replace_with(this)
this.insert_after(text_link)
text_link.insert_after(is_my)

Now your tree should look the way you want it to:

print(soup.p)
# <p>This <a href="#">text</a> is my <a href=""></a></p>

You can get the text of NavigableString, modify it, build new object model from modified text and then replace old NavigableString with this object model:

data = '<p>This text is my <a href="#">text</a><p>'
soup = BeautifulSoup(data)
original_string = soup.p.contents[0]
new_text = unicode(original_string).replace('text', '<a href="#">text</a>')
original_string.replaceWith(BeautifulSoup(text))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM