I am trying to scrape some html files to generate machine readable tables of the information shown in the html file.
soup = BeautifulSoup(open('/path/to/some/html/file/M32.html'), 'html.parser')
search = soup.findAll('a')
print(search)
This results in:
[<a href="https://antismash.secondarymetabolites.org/">
<img alt="antiSMASH logo" src="images/bacteria_antismash_logo.svg" style="width:40px;height:unset;"/>
</a>, <a href="https://antismash.secondarymetabolites.org/">
antiSMASH version 5.1.1
</a>, <a href="#" id="download-dropdown-link"><img alt="download" src="images/download.svg"/> Download</a>, <a href="M32.zip">Download all results</a>, <a href="M32.gbk">Download GenBank summary file</a>, <a href="https://antismash.secondarymetabolites.org/#!/about"><img alt="about" src="images/about.svg"/> About</a>, <a href="https://docs.antismash.secondarymetabolites.org/"><img alt="help" src="images/help.svg"/> Help</a>, <a href="https://antismash.secondarymetabolites.org/#!/contact"><img alt="contact" src="images/contact.svg"/> Contact</a>, <a href="#">Overview</a>, <a href="#r2c1">2.1</a>, <a href="#r3c1">3.1</a>, <a href="#r5c1">5.1</a>, <a href="#r7c1">7.1</a>, <a href="#r13c1">13.1</a>, <a href="#r14c1">14.1</a>, <a href="#r15c1">15.1</a>, <a href="#r17c1">17.1</a>, <a href="#r19c1">19.1</a>, <a href="#r20c1">20.1</a>, <a href="#r25c1">25.1</a>, <a href="#r41c1">41.1</a>, <a href="#r42c1">42.1</a>, <a href="#r57c1">57.1</a>, <a href="#r61c1">61.1</a>, <a href="#r62c1">62.1</a>, <a href="#r78c1">78.1</a>, <a href="#r92c1">92.1</a>, <a href="#r100c1">100.1</a>, <a href="#r107c1">107.1</a>, <a href="#r112c1">112.1</a>, <a href="#r116c1">116.1</a>, <a href="#r148c1">148.1</a>, <a href="#r172c1">172.1</a>, <a href="#r240c1">240.1</a>, <a href="#r262c1">262.1</a>, <a href="#r292c1">2
is there a way to format it so each new a find is placed on a new line in the print? It is very hard to find what i am looking for when it comes out as such a mess.
Expected:
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
<a something new
try this
search = soup.findAll('a') # it will return list
for tag in search:
print(tag)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.