使用 bs4 解析文本适用于 selenium 但不适用于 Python 中的请求

Question

This code works and returns the single digit number that i want but its so slow and takes good 10 seconds to complete.I will be running this 4 times for my use so thats 40 seconds wasted every run.这段代码有效并返回我想要的个位数，但它太慢了，需要 10 秒才能完成。我将运行这 4 次供我使用，这样每次运行都会浪费 40 秒。 ` from selenium import webdriver from bs4 import BeautifulSoup ` 从 selenium 导入 webdriver 从 bs4 导入 BeautifulSoup

options = webdriver.FirefoxOptions()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)

driver.get('https://warframe.market/items/ivara_prime_blueprint')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

price_element = soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))

driver.close()`

This code on the other hand does not work.另一方面，此代码不起作用。 It returns None.它返回无。 ` import requests from bs4 import BeautifulSoup ` 导入请求来自 bs4 导入 BeautifulSoup

url='https://warframe.market/items/ivara_prime_blueprint'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

price_element=soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))`

First thought was to add user agent but still did not work.首先想到的是添加用户代理，但仍然没有用。 When I print(soup) it gives me html code but when i parse it further it stops and starts giving me None even tho its the same command like in selenium example.当我打印（汤）时，它给了我 html 代码，但是当我进一步解析它时，它停止并开始给我 None 甚至它与 selenium 示例中的命令相同。

Answer 1

The data is loaded dynamically within a <script> tag so Beautifulsoup doesn't see it (it doesn't render Javascript).数据在<script>标签内动态加载，因此 Beautifulsoup 看不到它（它不呈现 Javascript）。

As an example, to get the data, you can use:例如，要获取数据，您可以使用：

import json
import requests
from bs4 import BeautifulSoup


url = "https://warframe.market/items/ivara_prime_blueprint"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

script_tag = soup.select_one("#application-state")

json_data = json.loads(script_tag.string)
# Uncomment the line below to see all the data
# from pprint import pprint
# pprint(json_data)

for data in json_data["payload"]["orders"]:
    print(data["user"]["ingame_name"])

Prints:印刷：

Rogue_Monarch
Rappei
KentKoes
Tenno61189
spinifer14
Andyfr0nt
hollowberzinho

You can access the data as a dict and acess the keys / values .您可以将数据作为dict访问并访问keys / values 。

I'd recommend an online tool to view all the JSON since it's quite large.我推荐一个在线工具来查看所有 JSON，因为它非常大。

使用 bs4 解析文本适用于 selenium 但不适用于 Python 中的请求

问题描述

1 个解决方案

解决方案1
1 2022-12-30 21:59:24

See also也可以看看

使用 bs4 解析文本适用于 selenium 但不适用于 Python 中的请求

问题描述

1 个解决方案

解决方案1 1 2022-12-30 21:59:24

See also也可以看看

解决方案1
1 2022-12-30 21:59:24