I have a problem in screen scraping using bs4. Following is my code.
from bs4 import BeautifulSoup
import urllib2
url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
properties=soup.findAll('a',{'title':'Bedroom'})
for eachproperty in properties:
print eachproperty['href']+",", eachproperty.string
When I analyzed the website, the actual title structure looks like this
1 Bedroom, Residential Apartment in Velachery
for all the anchor links. But I do not get any output for this and no error either. So how do I tell the program to scrape all the data that has the title containing the word "Bedroom"
?
Hope I made it clear.
You'll need to use a regular expression here, as you only want to match those anchor links that have Bedroom
in the title, not as the whole title:
import re
properties = soup.find_all('a', title=re.compile('Bedroom'))
This gives 47 matches for the URL you've given.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.