The program below scrapes the text of a web page. How could I get print(output)
to display in an HTML file so it could be loaded in a web browser?
import requests
from bs4 import BeautifulSoup
url = ''
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
output = ''
blacklist = [
'[document]',
'noscript',
'header',
'html',
'meta',
'head',
]
for t in text:
if t.parent.name not in blacklist:
output += '{} '.format(t)
print(output)
output
variable in HTML as <pre>
.HTML
file between <pre>
and <\pre>
.with open('display.html', 'w') as f:
f.write("<pre>\n")
for line in output.split("\n"):
f.write(line)
f.write("\n")
f.write("<\\pre>")
display.html
which will display all the text in output variable to web.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.