简体   繁体   中英

Python BeautifulSoup findAll formating

I am trying to scrape some html files to generate machine readable tables of the information shown in the html file.

soup = BeautifulSoup(open('/path/to/some/html/file/M32.html'), 'html.parser')
search = soup.findAll('a')
print(search)

This results in:

[<a href="https://antismash.secondarymetabolites.org/">
<img alt="antiSMASH logo" src="images/bacteria_antismash_logo.svg" style="width:40px;height:unset;"/>
</a>, <a href="https://antismash.secondarymetabolites.org/">
          antiSMASH version 5.1.1
    </a>, <a href="#" id="download-dropdown-link"><img alt="download" src="images/download.svg"/>   Download</a>, <a href="M32.zip">Download all results</a>, <a href="M32.gbk">Download GenBank summary file</a>, <a href="https://antismash.secondarymetabolites.org/#!/about"><img alt="about" src="images/about.svg"/>   About</a>, <a href="https://docs.antismash.secondarymetabolites.org/"><img alt="help" src="images/help.svg"/>   Help</a>, <a href="https://antismash.secondarymetabolites.org/#!/contact"><img alt="contact" src="images/contact.svg"/>   Contact</a>, <a href="#">Overview</a>, <a href="#r2c1">2.1</a>, <a href="#r3c1">3.1</a>, <a href="#r5c1">5.1</a>, <a href="#r7c1">7.1</a>, <a href="#r13c1">13.1</a>, <a href="#r14c1">14.1</a>, <a href="#r15c1">15.1</a>, <a href="#r17c1">17.1</a>, <a href="#r19c1">19.1</a>, <a href="#r20c1">20.1</a>, <a href="#r25c1">25.1</a>, <a href="#r41c1">41.1</a>, <a href="#r42c1">42.1</a>, <a href="#r57c1">57.1</a>, <a href="#r61c1">61.1</a>, <a href="#r62c1">62.1</a>, <a href="#r78c1">78.1</a>, <a href="#r92c1">92.1</a>, <a href="#r100c1">100.1</a>, <a href="#r107c1">107.1</a>, <a href="#r112c1">112.1</a>, <a href="#r116c1">116.1</a>, <a href="#r148c1">148.1</a>, <a href="#r172c1">172.1</a>, <a href="#r240c1">240.1</a>, <a href="#r262c1">262.1</a>, <a href="#r292c1">2

is there a way to format it so each new a find is placed on a new line in the print? It is very hard to find what i am looking for when it comes out as such a mess.

Expected:

<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new

try this

search = soup.findAll('a') # it will return list 

for tag in search:
    print(tag)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM