简体   繁体   中英

Reading all <p> into array python BS4

Hey I am trying to read all <p> tags into an array.

HTML Example:

<p>To test web scrapers against invalid markup we suggest scraping this page that contains the following markup mistakes:</p>
<p>It's obvious that not every web publisher pays much attention to validity of his HTML code.</p>

This should result in an Array like:

scraped = ["To test web scrapers against invalid markup we suggest scraping this page that contains the following markup mistakes:","It's obvious that not every web publisher pays much attention to validity of his HTML code."]

My current code is:

class Webscraper:

def fullscrape(self, url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    content = soup.getText()
    print(content)

But this does not seem to work properly.

You need to do find_all('p') and the iterate and store as list.

content =[item.text for item in soup.find_all('p')]

Code :

def fullscrape(self, url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    content =[item.text for item in soup.find_all('p')]
    print(content)

Try This:

for c in content:
   print(c)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM