使用Python和Beautiful Soup从表中获取列

Question

I am new to Python and I want to get the "price" column of data from a table however I'm unable to retrieve that data. 我是Python的新手，我想从表中获取数据的“价格”列，但是我无法检索该数据。

Currently what I'm doing: 目前我在做什么：

# Libraies
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")

for row in table.find_all("tr"):

    col = row.find_all("td")

    print(col[2])
    print("---")

I keep getting a list index out of value range. 我一直在获取超出值范围的列表索引。 I've read the documentation and tried a few different ways, but I can't seem to get it down. 我已经阅读了文档并尝试了几种不同的方法，但是我似乎无法理解它。

Also, I am using Python3. 另外，我正在使用Python3。

Answer 1

The problem is that you are iterating over all tr inside the table, and there is 1 header tr at the beginning that you don't need, so just avoid using that one: 问题是您要遍历表中的所有tr ，并且在开头不需要1个标头tr ，因此请避免使用该头：

    # Libraies
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")

for row in table.find_all("tr")[1:]:

    col = row.find_all("td")

    print(col[2])
    print("---")

Answer 2

Probably means that one of the rows has no td tag. 可能意味着其中一行没有td标签。 You could wrap the print or whatever usage of col[2] in a try except block and ignore cases where the col is empty or has less than three items 您可以尝试将print或col[2]任何用法包装在try除块中，并忽略col为空或少于三个项目的情况

for row in table.find_all("tr"):

    col = row.find_all("td")
    try:  
        print(col[2])
        print("---")
    except IndexError:
        pass

使用Python和Beautiful Soup从表中获取列

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-03-03 22:57:25

解决方案2
0 2017-03-03 22:56:13

使用Python和Beautiful Soup从表中获取列

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-03-03 22:57:25

解决方案2 0 2017-03-03 22:56:13

解决方案1
1 已采纳 2017-03-03 22:57:25

解决方案2
0 2017-03-03 22:56:13