简体   繁体   English

Python 错误:'NoneType' object 没有使用 Beautiful Soup 的属性 'find_all'

[英]Python Error: 'NoneType' object has no attribute 'find_all' using Beautiful Soup

I'm having a problem with some webscraping code that I'm trying to run.我在尝试运行一些网页抓取代码时遇到问题。 To scrape information from a series of links like the following:从一系列链接中抓取信息,如下所示:

http://www2.congreso.gob.pe/Sicr/TraDocEstProc/CLProLey2006.nsf/ec97fee42a2412d5052578bb001539ee/89045fe8ae896e2e0525751c005544cd?OpenDocument http://www2.congreso.gob.pe/Sicr/TraDocEstProc/CLProLey2006.nsf/ec97fee42a2412d5052578bb001539ee/89045fe8ae896e2e0525751c005544cd?OpenDocument

I am trying to scrape certain elements from the table, but I received the following error:我正在尝试从表中抓取某些元素,但收到以下错误:

Python Error: 'NoneType' object has no attribute 'find_all'

I know this has to do with the fact that it's not actually finding the table because when I run the following simplified code:我知道这与它实际上并没有找到表有关,因为当我运行以下简化代码时:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv
import time

url = 'http://www2.congreso.gob.pe/Sicr/TraDocEstProc/CLProLey2006.nsf/ec97fee42a2412d5052578bb001539ee/89045fe8ae896e2e0525751c005544cd?OpenDocument'

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')


table = soup.find('table', {'bordercolor' : '#6583A0'})
print(table)

It returns a 'None' for the printed table, meaning the code cannot scrape any of the features of the table.它为打印的表格返回“无”,这意味着代码无法抓取表格的任何特征。 I've been running similar code for similar pages and I am able to find the table just fine so I'm not sure why this is not working?我一直在为类似的页面运行类似的代码,并且我能够很好地找到表格,所以我不确定为什么这不起作用? I'm new to webscraping but I'd appreciate any help!我是网络抓取的新手,但我会很感激任何帮助!

So the soup doesn't parse the website content correctly, because one tag is incorrect and break the structure.所以汤没有正确解析网站内容,因为一个标签不正确,破坏了结构。 You have to fix it before parse it:您必须在解析之前修复它:

url = 'http://www2.congreso.gob.pe/Sicr/TraDocEstProc/CLProLey2006.nsf/ec97fee42a2412d5052578bb001539ee/89045fe8ae896e2e0525751c005544cd?OpenDocument'

page = requests.get(url)
soup = BeautifulSoup(page.text.replace("</script\n", "</script>"), 'html.parser')

table = soup.find('table', {'bordercolor' : '#6583A0'})
print(table)

I think the html contains some flaws that made the html parser fails to properlly parse your html, you can verify that by printing page.text and then print soup , you will find that the document has some parts removed by parser.我认为 html 包含一些缺陷,导致 html 解析器无法正确解析您的soup ,您可以通过打印page.text来验证您会发现某些部分已被打印

However lxml parser successfully parsed it with its flaw as lxml is better on ill-formatted html documents:然而,lxml 解析器成功地解析了它的缺陷,因为lxml在格式错误的 html 文档上效果更好:

rom bs4 import BeautifulSoup
import requests
import pandas as pd
import csv
import time

url = 'http://www2.congreso.gob.pe/Sicr/TraDocEstProc/CLProLey2006.nsf/ec97fee42a2412d5052578bb001539ee/89045fe8ae896e2e0525751c005544cd?OpenDocument'

page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')


table = soup.find('table', {'bordercolor' : '#6583A0'})
print(table)

that should catch the table tag correctly应该正确捕获表格标签


import pandas as pd

df = pd.read_html(
    "http://www2.congreso.gob.pe/Sicr/TraDocEstProc/CLProLey2006.nsf/ec97fee42a2412d5052578bb001539ee/89045fe8ae896e2e0525751c005544cd?OpenDocument")[0]

print(df)
df.to_csv("Data.csv", index=False, header=None)

Output: view online Output: 在线查看

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Beautiful Soup 错误:“NoneType”对象没有属性“find_all” - Beautiful Soup error: 'NoneType' object has no attribute 'find_all' NoneType object has no attribute find_all error using beautiful Soup - NoneType object has no attribute find_all error using beautiful Soup Python Beautiful Soup - AttributeError: 'NoneType' object 没有属性 'find_all' - Python Beautiful Soup - AttributeError: 'NoneType' object has no attribute 'find_all' AttributeError: 'NoneType' object 没有属性'find_all' Python Web 刮花 - AttributeError: 'NoneType' object has no attribute 'find_all' Python Web Scraping w/ Beautiful Soup Python - Beautiful Soup 4 - &#39;NavigableString&#39; 对象没有属性 &#39;find_all&#39; - Python - Beautiful Soup 4 - 'NavigableString' object has no attribute 'find_all' Beautiful Soup AttributeError: 'NoneType' object has no attribute 'find_all' 即使网页结构相同 - Beautiful Soup AttributeError: 'NoneType' object has no attribute 'find_all' even though webpage is structured the same Python属性错误:“ NoneType”对象没有属性“ find_all” - Python Attribute Error: 'NoneType' object has no attribute 'find_all' AttributeError:“函数”对象没有属性“ find_all”美丽汤 - AttributeError: 'function' object has no attribute 'find_all' Beautiful Soup 美丽的汤:“ ResultSet”对象没有属性“ find_all”吗? - Beautiful Soup: 'ResultSet' object has no attribute 'find_all'? Python错误:“ NoneType”对象没有属性“ find_all” - Python error: 'NoneType' object has no attribute 'find_all'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM