article scraping with beautifulsoup: scraping <p> tags inside <div > tags with ids

Question

i wrote a script in python to pull out particular paragraphs but then i end up getting all the information in that page. I want to scrap paragraphs inside with varying ids with different pages eg.

<div id="content-body-123123">

and this id varies for different pages. How can i identify this particular tag and pull out paragraphs inside this tag alone?

url='http://www.thehindu.com/opinion/op-ed/Does-Beijing-really-want-to-
ldquobreak-uprdquo-India/article16875298.ece'
page = requests.get(url)
html=page.content
soup = bs(html, 'html.parser')
for tag in soup.find_all('p'):
    print tag.text.encode('utf-8')+'\n'

Answer 1

Try this. The change of id number should not affect your result:

from bs4 import BeautifulSoup
import requests

url = 'http://www.thehindu.com/opinion/op-ed/Does-Beijing-really-want-to-ldquobreak-uprdquo-India/article16875298.ece'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
for content in soup.select("[id^='content-body-'] p"):
    print(content.text)

article scraping with beautifulsoup: scraping <p> tags inside <div > tags with ids

Question

1 answers

solution1
0 ACCPTED 2018-01-07 11:48:33

article scraping with beautifulsoup: scraping <p> tags inside <div > tags with ids

Question

1 answers

solution1 0 ACCPTED 2018-01-07 11:48:33

solution1
0 ACCPTED 2018-01-07 11:48:33