[英]Scrape Table Data from Website
I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. 我正在尝试使用BeautifulSoup4和Python从网站上抓取表格数据,然后使用结果创建一个Excel文档。 So far, I have this:
到目前为止,我有这个:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())
for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
tds = row('td')
print tds[0].string, tds[1].string
But it isn't working to display the data. 但它无法显示数据。
Any ideas? 有任何想法吗?
First of all the class is StandardResultsGrid
, not spad
. 首先,该类是
StandardResultsGrid
,而不是spad
。
Second, you don't need the tbody
thing. 其次,你不需要
tbody
的事情。 Simply use: 只需使用:
for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):
Also note, that since in the original page the row with header is included in tbody
for some reason, you'll have to skip the first row, so 还要注意,因为在原来的页面标题行包含在
tbody
出于某种原因,你必须跳过第一行,所以
for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]
And note that some cells include table
s in them, so you'll have to parse the contents of the td
s carefully. 并请注意,某些单元格中包含
table
,因此您必须仔细解析td
的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.