简体   繁体   English

从网站抓取表格数据

[英]Scrape Table Data from Website

I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. 我正在尝试使用BeautifulSoup4和Python从网站上抓取表格数据,然后使用结果创建一个Excel文档。 So far, I have this: 到目前为止,我有这个:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string

But it isn't working to display the data. 但它无法显示数据。

Any ideas? 有任何想法吗?

First of all the class is StandardResultsGrid , not spad . 首先,该类是StandardResultsGrid ,而不是spad

Second, you don't need the tbody thing. 其次,你不需要tbody的事情。 Simply use: 只需使用:

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):

Also note, that since in the original page the row with header is included in tbody for some reason, you'll have to skip the first row, so 还要注意,因为在原来的页面标题行包含在tbody出于某种原因,你必须跳过第一行,所以

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]

And note that some cells include table s in them, so you'll have to parse the contents of the td s carefully. 并请注意,某些单元格中包含table ,因此您必须仔细解析td的内容。

试图从网站上抓取一张桌子<div tags< div><div id="text_translate"><p> 我正在尝试刮这张桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我试过这个:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一个空列表。 有人可以就如何解决这个问题提供建议吗?</p></div></div> - Trying to scrape a table from a website with <div tags

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 从网站抓取表数据 - Scrape table data from website using Python 如何从网站上的最后一张表中抓取数据 - How to scrape data from last table on website Python 从网站上抓取表格? - Python scrape table from website? 如何使用异常表格从选举网站上抓取数据 - How to scrape data from election website with unusual table 无法使用网站上的 BeautifulSoup 抓取表格数据 - Not able to scrape table data using BeautifulSoup from a website 如何从加载缓慢的网站中抓取表格数据 - How to scrape table data from a website that is slow to load 无法从网站上抓取数据 - Unable to scrape data from a website 从表格中抓取数据 - Scrape data from table 如何从网站上抓取 JavaScript 表到 dataframe? - How to scrape JavaScript table from website to dataframe? 试图从网站上抓取一张桌子<div tags< div><div id="text_translate"><p> 我正在尝试刮这张桌子<a href="https://momentranks.com/topshot/account/mariodustice?limit=250" rel="nofollow noreferrer">https://momentranks.com/topshot/account/mariodustice?limit=250</a></p><p> 我试过这个:</p><pre> import requests from bs4 import BeautifulSoup url = 'https://momentranks.com/topshot/account/mariodustice?limit=250' page = requests.get(url) soup = BeautifulSoup(page.content, 'lxml') table = soup.find_all('table', attrs={'class':'Table_tr__1JI4P'})</pre><p> 但它返回一个空列表。 有人可以就如何解决这个问题提供建议吗?</p></div></div> - Trying to scrape a table from a website with <div tags
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM