![](/img/trans.png)
[英]'NoneType' object has no attribute 'find'. When trying to scrape using Beautiful soup
[英]NoneType when I try to scrape with Beautiful Soup
我在尝试提取表格时遇到了网页问题。 我的代码是:
import requests
from bs4 import BeautifulSoup
url ='https://www.ismworld.org/supply-management-news-and-reports/reports/ism-report-on-business/pmi/august/'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'}
data = requests.get(url, headers).text
soup = BeautifulSoup(data, 'html.parser')
t=soup.find("table", {"class": "table table-bordered table-hover table-responsive mb-4"})
print(t)
当我打印“t”时,我有一个无。 代码有什么问题?
谢谢!
让您的生活更轻松,试试pandas
。
要获取所有表格,请尝试以下操作:
import requests
import pandas as pd
url = 'https://www.ismworld.org/supply-management-news-and-reports/reports/ism-report-on-business/pmi/august/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
}
df = pd.read_html(requests.get(url, headers=headers).text, flavor='bs4')
print(df[0])
第一张表的样品output:
Index ... Trend* (Months)
0 Manufacturing PMI® ... 27
1 New Orders ... 1
2 Production ... 27
3 Employment ... 1
4 Supplier Deliveries ... 78
5 Inventories ... 13
6 Customers’ Inventories ... 71
7 Prices ... 27
8 Backlog of Orders ... 26
9 New Export Orders ... 1
10 Imports ... 3
11 OVERALL ECONOMY ... 27
12 Manufacturing Sector ... 27
[13 rows x 7 columns]
问题不在于您的代码,而在于网站响应。 发送请求后尝试添加以下代码段:
file = open("ismworld.html", "w")
file.write(data)
file.close()
然后检查文本文件的内容。 您会注意到该网站的响应首先不包含“表格”,因为该网站检测到您的请求是自动的并阻止了您。
如果您尝试深入研究,可以使用多种解决方案来避免这种情况(用户代理随机化、IP 轮换、使用浏览器发送请求等)。
但是,如果您想更专注于处理数据,而不是实际的 web 抓取实现,您也可以尝试 WebScrapingAPI。 该服务默认处理所有这些检测问题,并具有一个extract_rules
功能,该功能根据您指定的 CSS 选择器返回 JSON 格式的元素。
这是针对您的情况调整的 Python 示例:
import requests
import json
site = "https://www.ismworld.org/supply-management-news-and-reports/reports/ism-report-on-business/pmi/august/"
url = "https://api.webscrapingapi.com/v1"
extract_rules = {
"table": {
"selector": "table.table.table-bordered.table-hover.table-responsive.mb-4",
"output": "html"
}
}
params = {
"api_key": "YOUR_API_KEY",
"url": site,
"render_js": "1",
"extract_rules": json.dumps(extract_rules)
}
response = requests.get(url, params=params)
print(response.text)
和回应:
{"table":["<table class=\"table table-bordered table-hover table-responsive mb-4\">\n<thead>\n<tr>\n<th
class=\"text-center\" scope=\"col\">Index</th>\n<th class=\"text-center\" scope=\"col\">Series Index Aug
</th>\n<th class=\"text-center\" scope=\"col\">Series Index Jul</th>\n<th class=\"text-center\"
scope=\"col\">Percentage Point Change</th>\n<th class=\"text-center\" scope=\"col\">Direction</th>\n<th
class=\"text-center\" scope=\"col\">Rate of Change</th>\n<th class=\"text-center\" scope=\"col\">Trend*
(Months)</th>\n</tr>\n</thead>\n<tbody>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Manufacturing PMI<sup>®</sup></th>\n<td
class=\"text-center\">52.8</td>\n<td class=\"text-center\">52.8</td>\n<td class=\"text-center\">0.0</td>
\n<td class=\"text-center\">Growing</td>\n<td class=\"text-center\">Same</td>\n<td class=\"text-center\">27
</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">New Orders</th>\n<td class=\"text-center\">51.3</td>\n<td
class=\"text-center\">48.0</td>\n<td class=\"text-center\">+3.3</td>\n<td class=\"text-center\">Growing
</td>\n<td class=\"text-center\">From Contracting</td>\n<td class=\"text-center\">1</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Production</th>\n<td class=\"text-center\">50.4</td>\n<td
class=\"text-center\">53.5</td>\n<td class=\"text-center\">-3.1</td>\n<td class=\"text-center\">Growing
</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">27</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Employment</th>\n<td class=\"text-center\">54.2</td>\n<td
class=\"text-center\">49.9</td>\n<td class=\"text-center\">+4.3</td>\n<td class=\"text-center\">Growing
</td>\n<td class=\"text-center\">From Contracting</td>\n<td class=\"text-center\">1</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Supplier Deliveries</th>\n<td class=\"text-center\">55.1
</td>\n<td class=\"text-center\">55.2</td>\n<td class=\"text-center\">-0.1</td>\n<td class=\"text-center\">
Slowing</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">78</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Inventories</th>\n<td class=\"text-center\">53.1</td>\n<td
class=\"text-center\">57.3</td>\n<td class=\"text-center\">-4.2</td>\n<td class=\"text-center\">Growing
</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">13</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Customers’ Inventories</th>\n<td class=\"text-center\">38.9
</td>\n<td class=\"text-center\">39.5</td>\n<td class=\"text-center\">-0.6</td>\n<td class=\"text-center\">
Too Low</td>\n<td class=\"text-center\">Faster</td>\n<td class=\"text-center\">71</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Prices</th>\n<td class=\"text-center\">52.5</td>\n<td
class=\"text-center\">60.0</td>\n<td class=\"text-center\">-7.5</td>\n<td class=\"text-center\">
Increasing</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">27</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Backlog of Orders</th>\n<td class=\"text-center\">53.0</td>
\n<td class=\"text-center\">51.3</td>\n<td class=\"text-center\">+1.7</td>\n<td class=\"text-center\">
Growing</td>\n<td class=\"text-center\">Faster</td>\n<td class=\"text-center\">26</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">New Export Orders</th>\n<td class=\"text-center\">49.4</td>
\n<td class=\"text-center\">52.6</td>\n<td class=\"text-center\">-3.2</td>\n<td class=\"text-center\">
Contracting</td>\n<td class=\"text-center\">From Growing</td>\n<td class=\"text-center\">1</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th scope=\"row\">Imports</th>\n<td class=\"text-center\">52.5</td>\n<td
class=\"text-center\">54.4</td>\n<td class=\"text-center\">-1.9</td>\n<td class=\"text-center\">Growing
</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">3</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th class=\"text-center\" colspan=\"4\" scope=\"row\">OVERALL ECONOMY</th>\n
<td class=\"text-center\">Growing</td>\n<td class=\"text-center\">Same</td>\n<td class=\"text-center\">27
</td>\n
</tr>\n<tr>
<!-- Table#-Row#-Column# -->\n<th class=\"text-center\" colspan=\"4\" scope=\"row\">Manufacturing Sector
</th>\n<td class=\"text-center\">Growing</td>\n<td class=\"text-center\">Same</td>\n<td
class=\"text-center\">27</td>\n
</tr>\n</tbody>\n</table>"]}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.