Screen scraping based on title using python bs4

Question

I have a problem in screen scraping using bs4. Following is my code.

from bs4 import BeautifulSoup
import urllib2
url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
properties=soup.findAll('a',{'title':'Bedroom'})
for eachproperty in properties:
    print eachproperty['href']+",", eachproperty.string

When I analyzed the website, the actual title structure looks like this

1 Bedroom, Residential Apartment in Velachery for all the anchor links. But I do not get any output for this and no error either. So how do I tell the program to scrape all the data that has the title containing the word "Bedroom" ?

Hope I made it clear.

Answer 1

You'll need to use a regular expression here, as you only want to match those anchor links that have Bedroom in the title, not as the whole title:

import re

properties = soup.find_all('a', title=re.compile('Bedroom'))

This gives 47 matches for the URL you've given.

Screen scraping based on title using python bs4

Question

1 answers

solution1
2 ACCPTED 2013-09-04 12:03:51

Screen scraping based on title using python bs4

Question

1 answers

solution1 2 ACCPTED 2013-09-04 12:03:51

solution1
2 ACCPTED 2013-09-04 12:03:51