[英]Get column from a table with Python and Beautiful Soup
I am new to Python and I want to get the "price" column of data from a table however I'm unable to retrieve that data. 我是Python的新手,我想从表中获取数据的“价格”列,但是我无法检索该数据。
Currently what I'm doing: 目前我在做什么:
# Libraies
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")
for row in table.find_all("tr"):
col = row.find_all("td")
print(col[2])
print("---")
I keep getting a list index out of value range. 我一直在获取超出值范围的列表索引。 I've read the documentation and tried a few different ways, but I can't seem to get it down.
我已经阅读了文档并尝试了几种不同的方法,但是我似乎无法理解它。
Also, I am using Python3. 另外,我正在使用Python3。
The problem is that you are iterating over all tr
inside the table, and there is 1 header tr
at the beginning that you don't need, so just avoid using that one: 问题是您要遍历表中的所有
tr
,并且在开头不需要1个标头tr
,因此请避免使用该头:
# Libraies
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")
for row in table.find_all("tr")[1:]:
col = row.find_all("td")
print(col[2])
print("---")
Probably means that one of the rows has no td
tag. 可能意味着其中一行没有
td
标签。 You could wrap the print
or whatever usage of col[2]
in a try except block and ignore cases where the col
is empty or has less than three items 您可以尝试将
print
或col[2]
任何用法包装在try除块中,并忽略col
为空或少于三个项目的情况
for row in table.find_all("tr"):
col = row.find_all("td")
try:
print(col[2])
print("---")
except IndexError:
pass
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.