Hey I am trying to read all <p>
tags into an array.
HTML Example:
<p>To test web scrapers against invalid markup we suggest scraping this page that contains the following markup mistakes:</p>
<p>It's obvious that not every web publisher pays much attention to validity of his HTML code.</p>
This should result in an Array like:
scraped = ["To test web scrapers against invalid markup we suggest scraping this page that contains the following markup mistakes:","It's obvious that not every web publisher pays much attention to validity of his HTML code."]
My current code is:
class Webscraper:
def fullscrape(self, url):
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
content = soup.getText()
print(content)
But this does not seem to work properly.
You need to do find_all('p')
and the iterate and store as list.
content =[item.text for item in soup.find_all('p')]
Code :
def fullscrape(self, url):
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
content =[item.text for item in soup.find_all('p')]
print(content)
Try This:
for c in content:
print(c)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.