简体   繁体   English

使用Python网络抓取数据时发生AttributeError

[英]AttributeError when web-scraping data using Python

I'm trying to access the data in the Table in this URL . 我正在尝试访问此URL中表中的数据。 I am using the code below but I'm coming across the Error AttributeError: 'NoneType' object has no attribute 'find' in the line data = iter(soup.find("table", {"class": "xtTblCon"}).find("div", {"id": "MATURITYY%"}).find_all_next("li")) . 我正在使用下面的代码,但遇到错误AttributeError: 'NoneType' object has no attribute 'find'在行data = iter(soup.find("table", {"class": "xtTblCon"}).find("div", {"id": "MATURITYY%"}).find_all_next("li")) AttributeError: 'NoneType' object has no attribute 'find' data = iter(soup.find("table", {"class": "xtTblCon"}).find("div", {"id": "MATURITYY%"}).find_all_next("li")) The code is as follows: 代码如下:

from bs4 import BeautifulSoup
import requests

r = requests.get(
"http://appsso.eurostat.ec.europa.eu/nui/submitViewTableAction.do")
soup = BeautifulSoup(r.content)

data = iter(soup.find("table", {"class": "xtTblCon"}).find("div", {"id": "MATURITYY%"}).find_all_next("li"))

Edit : Sorry, this is the original URL . 编辑 :对不起,这是原始URL Sorry, I had to "customize" the table by clicking on the "Time" Toolbar and checking all the Years upto 2007. Is there a way to gather all this data? 抱歉,我必须通过单击“时间”工具栏并检查到2007年的所有Year来“自定义”表。是否可以收集所有这些数据?

Thank You 谢谢

The column names are repeated so I used an OrderedDict to keep the order and remove dupes, the rows are grouped in sublists and the maturity column is all in one list matching one for each row: 列名是重复的,因此我使用OrderedDict来保持顺序并删除重复项,行被分组在子列表中,并且成熟度列都在一个列表中,每行匹配一个:

data = soup.find("table",{"class":"xTable" })

from collections import OrderedDict
headers = OrderedDict.fromkeys(s["title"] for s in soup.find("div", {"class":"xtRowCon"}).find_all("span"))

rows = [[ele.text.strip() for ele in tag.find_all("td")] for tag in data.find_all("tr")]

maturity = [ele.find("span",{"class":"label_MATURITY"}).text.strip() for ele in soup.find("div",{"class":"xtTblCon"}).find_all("li")]


print(headers.keys())
print(rows)
print(maturity)

Output: 输出:

['2015M05D27', '2015M05D28', '2015M05D29', '2015M06D01', '2015M06D02', '2015M06D03', '2015M06D04', '2015M06D05', '2015M06D08', '2015M06D09']

[[u'-0.24', u'-0.26', u'-0.25', u'-0.25', u'-0.25', u'-0.22', u'-0.25', u'-0.24', u'-0.23', u'-0.22'], [u'-0.22', u'-0.23', u'-0.23', u'-0.23', u'-0.20', u'-0.18', u'-0.20', u'-0.19', u'-0.18', u'-0.16'], [u'-0.15', u'-0.15', u'-0.16', u'-0.16', u'-0.11', u'-0.07', u'-0.09', u'-0.07', u'-0.07', u'-0.04'], [u'-0.04', u'-0.05', u'-0.06', u'-0.07', u'0.01', u'0.08', u'0.06', u'0.08', u'0.09', u'0.13'], [u'0.08', u'0.07', u'0.06', u'0.05', u'0.15', u'0.24', u'0.23', u'0.25', u'0.26', u'0.31'], [u'0.21', u'0.20', u'0.18', u'0.17', u'0.29', u'0.41', u'0.40', u'0.42', u'0.43', u'0.49'], [u'0.34', u'0.32', u'0.30', u'0.29', u'0.43', u'0.56', u'0.56', u'0.58', u'0.59', u'0.66'], [u'0.46', u'0.43', u'0.42', u'0.40', u'0.55', u'0.70', u'0.70', u'0.72', u'0.74', u'0.81'], [u'0.57', u'0.54', u'0.52', u'0.50', u'0.66', u'0.82', u'0.82', u'0.85', u'0.87', u'0.94'], [u'0.66', u'0.63', u'0.61', u'0.59', u'0.75', u'0.93', u'0.93', u'0.95', u'0.98', u'1.06'], [u'0.74', u'0.71', u'0.69', u'0.67', u'0.84', u'1.02', u'1.02', u'1.05', u'1.07', u'1.16'], [u'0.81', u'0.78', u'0.76', u'0.74', u'0.91', u'1.11', u'1.10', u'1.13', u'1.16', u'1.24'], [u'0.88', u'0.84', u'0.82', u'0.80', u'0.97', u'1.18', u'1.17', u'1.20', u'1.23', u'1.32'], [u'0.93', u'0.90', u'0.88', u'0.85', u'1.03', u'1.24', u'1.23', u'1.26', u'1.29', u'1.38'], [u'0.98', u'0.95', u'0.92', u'0.90', u'1.08', u'1.29', u'1.29', u'1.32', u'1.35', u'1.44'], [u'1.02', u'0.99', u'0.97', u'0.94', u'1.12', u'1.34', u'1.33', u'1.36', u'1.40', u'1.49'], [u'1.06', u'1.03', u'1.00', u'0.98', u'1.16', u'1.39', u'1.37', u'1.41', u'1.44', u'1.53'], [u'1.10', u'1.06', u'1.04', u'1.01', u'1.19', u'1.42', u'1.41', u'1.44', u'1.48', u'1.57'], [u'1.13', u'1.09', u'1.07', u'1.04', u'1.22', u'1.46', u'1.45', u'1.48', u'1.51', u'1.61'], [u'1.16', u'1.12', u'1.09', u'1.06', u'1.25', u'1.49', u'1.48', u'1.51', u'1.54', u'1.64'], [u'1.18', u'1.15', u'1.12', u'1.09', u'1.27', u'1.52', u'1.50', u'1.53', u'1.57', u'1.67'], [u'1.20', u'1.17', u'1.14', u'1.11', u'1.30', u'1.54', u'1.53', u'1.56', u'1.60', u'1.69'], [u'1.22', u'1.19', u'1.16', u'1.13', u'1.32', u'1.56', u'1.55', u'1.58', u'1.62', u'1.72'], [u'1.24', u'1.21', u'1.18', u'1.15', u'1.34', u'1.59', u'1.57', u'1.60', u'1.64', u'1.74'], [u'1.26', u'1.23', u'1.20', u'1.17', u'1.35', u'1.61', u'1.59', u'1.62', u'1.66', u'1.76'], [u'1.28', u'1.24', u'1.21', u'1.18', u'1.37', u'1.62', u'1.61', u'1.64', u'1.68', u'1.78'], [u'1.29', u'1.26', u'1.23', u'1.20', u'1.38', u'1.64', u'1.62', u'1.66', u'1.70', u'1.80'], [u'1.31', u'1.27', u'1.24', u'1.21', u'1.40', u'1.66', u'1.64', u'1.67', u'1.71', u'1.81'], [u'1.32', u'1.28', u'1.26', u'1.22', u'1.41', u'1.67', u'1.65', u'1.69', u'1.73', u'1.83'], [u'1.33', u'1.30', u'1.27', u'1.23', u'1.42', u'1.68', u'1.67', u'1.70', u'1.74', u'1.84']]

[u'Maturity: 1 year', u'Maturity: 2 years', u'Maturity: 3 years', u'Maturity: 4 years', u'Maturity: 5 years', u'Maturity: 6 years', u'Maturity: 7 years', u'Maturity: 8 years', u'Maturity: 9 years', u'Maturity: 10 years', u'Maturity: 11 years', u'Maturity: 12 years', u'Maturity: 13 years', u'Maturity: 14 years', u'Maturity: 15 years', u'Maturity: 16 years', u'Maturity: 17 years', u'Maturity: 18 years', u'Maturity: 19 years', u'Maturity: 20 years', u'Maturity: 21 years', u'Maturity: 22 years', u'Maturity: 23 years', u'Maturity: 24 years', u'Maturity: 25 years', u'Maturity: 26 years', u'Maturity: 27 years', u'Maturity: 28 years', u'Maturity: 29 years', u'Maturity: 30 years']

If you want to group the rows with each maturity you could create adict using an OrderedDict to keep the order: 如果要对每个到期日的行进行分组,则可以使用OrderedDict来创建adict以保持顺序:

print(OrderedDict(zip(maturity,rows)))

OrderedDict([(u'Maturity: 1 year', [u'-0.24', u'-0.26', u'-0.25', 
u'-0.25', u'-0.25', u'-0.22', u'-0.25', u'-0.24', u'-0.23', 
u'-0.22']), (u'Maturity: 2 years', [u'-0.22', u'-0.23', u'-0.23', 
u'-0.23', u'-0.20', u'-0.18', u'-0.20', u'-0.19', u'-0.18',    u'-0.16']), (u'Maturity: 3 years', [u'-0.15', u'-0.15', u'-0.16', 
u'-0.16', u'-0.11', ..........................

First, there is no table with class xtTblCon. 首先,没有带有xtTblCon类的表。 It is actually a div element. 它实际上是一个div元素。 Change the 'table' to 'div'. 将“表”更改为“ div”。 Second, there is no div with id MATURITYY%. 其次,没有ID为MATURITYY%的div。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM