Scrape Table Data from Website

Question

I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. So far, I have this:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string

But it isn't working to display the data.

Any ideas?

Answer 1

First of all the class is StandardResultsGrid , not spad .

Second, you don't need the tbody thing. Simply use:

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):

Also note, that since in the original page the row with header is included in tbody for some reason, you'll have to skip the first row, so

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]

And note that some cells include table s in them, so you'll have to parse the contents of the td s carefully.

Scrape Table Data from Website

Question

1 answers

solution1
5 2013-05-26 19:41:53

Scrape Table Data from Website

Question

1 answers

solution1 5 2013-05-26 19:41:53

solution1
5 2013-05-26 19:41:53