简体   繁体   中英

How to change inner text of HTML tags without removing them

Assuming I have "semi" HMTL string like

some_string = "sometext<body>someText<h1>Text</h1>Worldt<p>And some text here<br>Text.</p></body>HereAlsoText"

I need to replace all tags in the string but with keeping all HTML tags (including br):

"UPDATED<body>UPDATED<h1>UPDATED</h1>UPDATED<p>UPDATED<br>UPDATED</p></body>UPDATED"

The following code works, but cannot do anything with <br> tag and text before and after html (outside of body tag, in this case):

soup = BeautifulSoup(mod_string, "html.parser")


# Find all tags
tags = soup.find_all()
# Loop through child tags
for tag in tags:
    # Check if tag is a string
    if tag.string:
        if tag.name != 'br':

            # Replace string
            tag.string.replace_with("TEST")

for parent_tag in tags:
    if not parent_tag.string:
        parent_tag.string = ''.join(
        ["TEST"
            if not re.match(r'<[^>]+>', str(t)) else str(t)
            for t in parent_tag.contents])

Appreciate your help. Thanks!

Keep it more simple, just select all the text nodes and replace the text as you have already tried in your example:

for e in soup.find_all(text=True):
    e.string.replace_with('UPDATE')

Example

import requests
from bs4 import BeautifulSoup

some_string = 'sometext<body>someText<h1>Text</h1>Worldt<p>And some text here<br>Text.</p></body>HereAlsoText'

soup = BeautifulSoup(some_string, 'html.parser')

for e in soup.find_all(text=True):
    e.string.replace_with('UPDATE')

print(soup)

Output

UPDATE<body>UPDATE<h1>UPDATE</h1>UPDATE<p>UPDATE<br/>UPDATE</p></body>UPDATE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM