简体   繁体   中英

How the change EVERY children tag (of a specific nature) to a different one using BeauifulSoup

In the Given HTML below:

given = """<html>
    <body>
        Free Text: Above
        <ul>
            <li> data 1 </li>
            <li>
                <ul>
                    <li> 
                        <ol start = "321">
                            <li> sub-sub list 1 
                                <ol>
                                    <li> sub sub sub list </li>
                                </ol>
                            </li>
                            <li> sub-sub list 2 </li>
                        </ol>
                    </li>
                    <li> sub list 2 </li>
                    <li> sub list 3 </li>
                </ul>
            </li>
            <li> <p> list type paragraph </p> data 3 </li>
        </ul>

        Free Text: Middle
        
        <ul>
            <li> Second UL list </li>
            <li> Second List part 2 </li>
        </ul>

        Free Text : Below
    </body>
</html>"""

Now I want to ask:

How can I change the Children <li> tags whose ANY of the parent is

  • to something else, say <SOME> (please don't ask why would I want to and I won't be able to render it. I have reasons)

    In a nutshell, I want my above code to look like:

     result = """<html> <body> Free Text: Above <ul> <li> data 1 </li> <li> <ul> <SOME> <ol start = "321"> <SOME> sub-sub list 1 <ol> <SOME> sub sub sub list </SOME> </ol> </SOME> <SOME> sub-sub list 2 </SOME> </ol> </SOME> <SOME> sub list 2 </SOME> <SOME> sub list 3 </SOME> </ul> </li> <li> <p> list type paragraph </p>data 3 </li> </ul> Free Text: Middle <ul> <li> Second UL list </li> <li> Second List part 2 </li> </ul> Free Text: Below </body> </html>"""

    I tried (with and without tag.decompose :

     soup = BeautifulSoup(given, 'html.parser') for tag in soup.find_all(['li']): if tag.find_parents("li"): new_tag = soup.new_tag("SOME") new_tag.string = tag.text tag.replace_with(new_tag) result = str(soup)

    but it doesn't seem to work on depth > 1 such as inner tags like sub-sub list etc

  • Instead of .replace_with() may simply rename it with .name to keep structure:

    for tag in soup.select('li li'):
        tag.name = 'SOME'
    

    Example

    from bs4 import BeautifulSoup
    
    html = '''<html>
        <body>
            Free Text: Above
            <ul>
                <li> data 1 </li>
                <li>
                    <ul>
                        <li> 
                            <ol start = "321">
                                <li> sub-sub list 1 
                                    <ol>
                                        <li> sub sub sub list </li>
                                    </ol>
                                </li>
                                <li> sub-sub list 2 </li>
                            </ol>
                        </li>
                        <li> sub list 2 </li>
                        <li> sub list 3 </li>
                    </ul>
                </li>
                <li> <p> list type paragraph </p> data 3 </li>
            </ul>
    
            Free Text: Middle
            
            <ul>
                <li> Second UL list </li>
                <li> Second List part 2 </li>
            </ul>
    
            Free Text : Below
        </body>
    </html>'''
    soup = BeautifulSoup(html)
    
    for tag in soup.select('li li'):
        tag.name = 'SOME'
    
    soup
    

    The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2025 STACKOOM.COM