简体   繁体   中英

Screen scraping based on title using python bs4

I have a problem in screen scraping using bs4. Following is my code.

from bs4 import BeautifulSoup
import urllib2
url="http://www.99acres.com/property-in-velachery-chennai-south-ffid?"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
properties=soup.findAll('a',{'title':'Bedroom'})
for eachproperty in properties:
    print eachproperty['href']+",", eachproperty.string

When I analyzed the website, the actual title structure looks like this

1 Bedroom, Residential Apartment in Velachery for all the anchor links. But I do not get any output for this and no error either. So how do I tell the program to scrape all the data that has the title containing the word "Bedroom" ?

Hope I made it clear.

You'll need to use a regular expression here, as you only want to match those anchor links that have Bedroom in the title, not as the whole title:

import re

properties = soup.find_all('a', title=re.compile('Bedroom'))

This gives 47 matches for the URL you've given.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM