简体   繁体   中英

Python: Modifying contents of <a> elements

I have a web page I'm scraping and parsing, using Beautiful Soup. On this webpage there are several refernces to other sources. They look a lot like this:`

Shakespeare wrote good, such as in <a href="link_to_source">Romeo and Juliet, IV:ii</a>.

What I'd like to have is:

Shakespeare wrote good, such as in (Romeo and Juliet, IV:ii).

Bare in mind, that this is a very long webpage with many lines and I need to combine all of them, so just modifying one "a" tag won't work for me, I need to modify all "a" tags on the page.

This is something I've tried already:

piska_ps = url_to_soup('https://he.wikisource.org'+a['href']).find_all('p')
    p_box = []
    for p in piska_ps:
        if p.a:
            for a_link in p.a:
                a_link.string = "("+a_link.string+")"

You may use replace_with to replace a tag:

piska_ps = url_to_soup('https://he.wikisource.org'+a['href']).find_all('p')
for p in piska_ps:
    for a in p.find_all('a'):
        a.replace_with("(" + a.string + ")")

First, pa is equal to p.find('a') , which return one tag, you can not iterate over it.

piska_ps = url_to_soup('https://he.wikisource.org'+a['href']).find_all('p')
p_box = []
    for p in piska_ps:
        if p.a:
            p.a.string = "("+p.a.string+")"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM