简体   繁体   English

使用Python和Beautiful Soup从表中获取列

[英]Get column from a table with Python and Beautiful Soup

I am new to Python and I want to get the "price" column of data from a table however I'm unable to retrieve that data. 我是Python的新手,我想从表中获取数据的“价格”列,但是我无法检索该数据。

Currently what I'm doing: 目前我在做什么:

# Libraies
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")

for row in table.find_all("tr"):

    col = row.find_all("td")

    print(col[2])
    print("---")

I keep getting a list index out of value range. 我一直在获取超出值范围的列表索引。 I've read the documentation and tried a few different ways, but I can't seem to get it down. 我已经阅读了文档并尝试了几种不同的方法,但是我似乎无法理解它。

Also, I am using Python3. 另外,我正在使用Python3。

The problem is that you are iterating over all tr inside the table, and there is 1 header tr at the beginning that you don't need, so just avoid using that one: 问题是您要遍历表中的所有tr ,并且在开头不需要1个标头tr ,因此请避免使用该头:

    # Libraies
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")

for row in table.find_all("tr")[1:]:

    col = row.find_all("td")

    print(col[2])
    print("---")

Probably means that one of the rows has no td tag. 可能意味着其中一行没有td标签。 You could wrap the print or whatever usage of col[2] in a try except block and ignore cases where the col is empty or has less than three items 您可以尝试将printcol[2]任何用法包装在try除块中,并忽略col为空或少于三个项目的情况

for row in table.find_all("tr"):

    col = row.find_all("td")
    try:  
        print(col[2])
        print("---")
    except IndexError:
        pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM