繁体   English   中英

当我尝试用 Beautiful Soup 刮擦时,NoneType

[英]NoneType when I try to scrape with Beautiful Soup

我在尝试提取表格时遇到了网页问题。 我的代码是:

import requests
from bs4 import BeautifulSoup

url ='https://www.ismworld.org/supply-management-news-and-reports/reports/ism-report-on-business/pmi/august/'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'}
data = requests.get(url, headers).text
soup = BeautifulSoup(data, 'html.parser')
t=soup.find("table", {"class": "table table-bordered table-hover table-responsive mb-4"})
print(t)
 

当我打印“t”时,我有一个无。 代码有什么问题?

谢谢!

让您的生活更轻松,试试pandas

要获取所有表格,请尝试以下操作:

import requests
import pandas as pd

url = 'https://www.ismworld.org/supply-management-news-and-reports/reports/ism-report-on-business/pmi/august/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
}
df = pd.read_html(requests.get(url, headers=headers).text, flavor='bs4')
print(df[0])

第一张表的样品output:

                     Index  ... Trend* (Months)
0       Manufacturing PMI®  ...              27
1               New Orders  ...               1
2               Production  ...              27
3               Employment  ...               1
4      Supplier Deliveries  ...              78
5              Inventories  ...              13
6   Customers’ Inventories  ...              71
7                   Prices  ...              27
8        Backlog of Orders  ...              26
9        New Export Orders  ...               1
10                 Imports  ...               3
11         OVERALL ECONOMY  ...              27
12    Manufacturing Sector  ...              27

[13 rows x 7 columns]

问题不在于您的代码,而在于网站响应。 发送请求后尝试添加以下代码段:

file = open("ismworld.html", "w")
file.write(data)
file.close()

然后检查文本文件的内容。 您会注意到该网站的响应首先不包含“表格”,因为该网站检测到您的请求是自动的并阻止了您。

如果您尝试深入研究,可以使用多种解决方案来避免这种情况(用户代理随机化、IP 轮换、使用浏览器发送请求等)。

但是,如果您想更专注于处理数据,而不是实际的 web 抓取实现,您也可以尝试 WebScrapingAPI。 该服务默认处理所有这些检测问题,并具有一个extract_rules功能,该功能根据您指定的 CSS 选择器返回 JSON 格式的元素。

这是针对您的情况调整的 Python 示例:

import requests
import json

site = "https://www.ismworld.org/supply-management-news-and-reports/reports/ism-report-on-business/pmi/august/"
url = "https://api.webscrapingapi.com/v1"

extract_rules = {
    "table": {
        "selector": "table.table.table-bordered.table-hover.table-responsive.mb-4",
        "output": "html"
    }
}

params = {
    "api_key": "YOUR_API_KEY",
    "url": site,
    "render_js": "1",
    "extract_rules": json.dumps(extract_rules)
}

response = requests.get(url, params=params)
print(response.text)

和回应:

{"table":["<table class=\"table table-bordered table-hover table-responsive mb-4\">\n<thead>\n<tr>\n<th
                class=\"text-center\" scope=\"col\">Index</th>\n<th class=\"text-center\" scope=\"col\">Series Index Aug
            </th>\n<th class=\"text-center\" scope=\"col\">Series Index Jul</th>\n<th class=\"text-center\"
                scope=\"col\">Percentage Point Change</th>\n<th class=\"text-center\" scope=\"col\">Direction</th>\n<th
                class=\"text-center\" scope=\"col\">Rate of Change</th>\n<th class=\"text-center\" scope=\"col\">Trend*
                (Months)</th>\n</tr>\n</thead>\n<tbody>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Manufacturing PMI<sup>®</sup></th>\n<td
                class=\"text-center\">52.8</td>\n<td class=\"text-center\">52.8</td>\n<td class=\"text-center\">0.0</td>
            \n<td class=\"text-center\">Growing</td>\n<td class=\"text-center\">Same</td>\n<td class=\"text-center\">27
            </td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">New Orders</th>\n<td class=\"text-center\">51.3</td>\n<td
                class=\"text-center\">48.0</td>\n<td class=\"text-center\">+3.3</td>\n<td class=\"text-center\">Growing
            </td>\n<td class=\"text-center\">From Contracting</td>\n<td class=\"text-center\">1</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Production</th>\n<td class=\"text-center\">50.4</td>\n<td
                class=\"text-center\">53.5</td>\n<td class=\"text-center\">-3.1</td>\n<td class=\"text-center\">Growing
            </td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">27</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Employment</th>\n<td class=\"text-center\">54.2</td>\n<td
                class=\"text-center\">49.9</td>\n<td class=\"text-center\">+4.3</td>\n<td class=\"text-center\">Growing
            </td>\n<td class=\"text-center\">From Contracting</td>\n<td class=\"text-center\">1</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Supplier Deliveries</th>\n<td class=\"text-center\">55.1
            </td>\n<td class=\"text-center\">55.2</td>\n<td class=\"text-center\">-0.1</td>\n<td class=\"text-center\">
                Slowing</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">78</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Inventories</th>\n<td class=\"text-center\">53.1</td>\n<td
                class=\"text-center\">57.3</td>\n<td class=\"text-center\">-4.2</td>\n<td class=\"text-center\">Growing
            </td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">13</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Customers’ Inventories</th>\n<td class=\"text-center\">38.9
            </td>\n<td class=\"text-center\">39.5</td>\n<td class=\"text-center\">-0.6</td>\n<td class=\"text-center\">
                Too Low</td>\n<td class=\"text-center\">Faster</td>\n<td class=\"text-center\">71</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Prices</th>\n<td class=\"text-center\">52.5</td>\n<td
                class=\"text-center\">60.0</td>\n<td class=\"text-center\">-7.5</td>\n<td class=\"text-center\">
                Increasing</td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">27</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Backlog of Orders</th>\n<td class=\"text-center\">53.0</td>
            \n<td class=\"text-center\">51.3</td>\n<td class=\"text-center\">+1.7</td>\n<td class=\"text-center\">
                Growing</td>\n<td class=\"text-center\">Faster</td>\n<td class=\"text-center\">26</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">New Export Orders</th>\n<td class=\"text-center\">49.4</td>
            \n<td class=\"text-center\">52.6</td>\n<td class=\"text-center\">-3.2</td>\n<td class=\"text-center\">
                Contracting</td>\n<td class=\"text-center\">From Growing</td>\n<td class=\"text-center\">1</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th scope=\"row\">Imports</th>\n<td class=\"text-center\">52.5</td>\n<td
                class=\"text-center\">54.4</td>\n<td class=\"text-center\">-1.9</td>\n<td class=\"text-center\">Growing
            </td>\n<td class=\"text-center\">Slower</td>\n<td class=\"text-center\">3</td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th class=\"text-center\" colspan=\"4\" scope=\"row\">OVERALL ECONOMY</th>\n
            <td class=\"text-center\">Growing</td>\n<td class=\"text-center\">Same</td>\n<td class=\"text-center\">27
            </td>\n
        </tr>\n<tr>
            <!-- Table#-Row#-Column# -->\n<th class=\"text-center\" colspan=\"4\" scope=\"row\">Manufacturing Sector
            </th>\n<td class=\"text-center\">Growing</td>\n<td class=\"text-center\">Same</td>\n<td
                class=\"text-center\">27</td>\n
        </tr>\n</tbody>\n</table>"]}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM