从网页上的表格中提取数据

Question

I am trying to extract data from a table on a web page with beautiful soup. 我正在尝试从网页上的表格中提取数据，其中包含漂亮的汤。 I want to get the data inside the cells for each row. 我想获取每一行的单元格内的数据。

I am new to python have tried the following snippet, but it's not working: 我是python的新手，尝试了以下代码段，但无法正常工作：

import urllib.request
fname = r"C:\Python34\page.htm"
HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(source_code, 'html.parser')
table = soup.find( "table", {"title":"geoip-demo-results-tbody"} )
rows=list()
for row in table.findAll("tr"):
   rows.append(row)
for tr in rows:
    cols = tr.findAll('td')
    p = col[0].string.strip()
    d = col[1].string.strip()
    print(p)
    print(d)

EDIT:Im getting this error Traceback (most recent call last): File "C:\\Python34\\scrip.py", line 14, in d = cols[1].text.strip() IndexError: list index out of range" for the row 84.78.229.78ESSantander, 编辑：我收到此错误回溯（最近一次调用最近）：文件“ C：\\ Python34 \\ scrip.py”，第14行，在d = cols [1] .text.strip（）IndexError：列表索引超出范围”对于84.78.229.78ESSantander，
Cantabria, 坎塔布里亚
Cantabria, 坎塔布里亚
Sp‌ain, Sp‌ain，
Europe3900143.4647, 欧洲3900143.4647，
-3.8044Orange EspanaOrange Espana this is the html file which generated the above error www.pastebin.com/tQ3Cp5Wj thanks -3.8044Orange EspanaOrange Espana这是生成上述错误的html文件www.pastebin.com/tQ3Cp5Wj谢谢

Answer 1

fname = r"F:\Vikas\jobs\temp\page.htm"
HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(source_code, 'html.parser')

table = soup.find('tbody', id='geoip-demo-results-tbody')
rows = table.find_all('tr')
for tr in rows:
    cols = tr.find_all('td')
    p = cols[0].text.strip()
    d = cols[1].text.strip()
    print(p)
    print(d)

从网页上的表格中提取数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-09-13 14:34:41

从网页上的表格中提取数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-09-13 14:34:41

解决方案1
1 已采纳 2015-09-13 14:34:41