简体   繁体   English

使用 bs4 解析文本适用于 selenium 但不适用于 Python 中的请求

[英]Parsing text with bs4 works with selenium but does not work with requests in Python

This code works and returns the single digit number that i want but its so slow and takes good 10 seconds to complete.I will be running this 4 times for my use so thats 40 seconds wasted every run.这段代码有效并返回我想要的个位数,但它太慢了,需要 10 秒才能完成。我将运行这 4 次供我使用,这样每次运行都会浪费 40 秒。 ` from selenium import webdriver from bs4 import BeautifulSoup ` 从 selenium 导入 webdriver 从 bs4 导入 BeautifulSoup

options = webdriver.FirefoxOptions()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)

driver.get('https://warframe.market/items/ivara_prime_blueprint')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

price_element = soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))

driver.close()`

This code on the other hand does not work.另一方面,此代码不起作用。 It returns None.它返回无。 ` import requests from bs4 import BeautifulSoup ` 导入请求来自 bs4 导入 BeautifulSoup

url='https://warframe.market/items/ivara_prime_blueprint'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

price_element=soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))`

First thought was to add user agent but still did not work.首先想到的是添加用户代理,但仍然没有用。 When I print(soup) it gives me html code but when i parse it further it stops and starts giving me None even tho its the same command like in selenium example.当我打印(汤)时,它给了我 html 代码,但是当我进一步解析它时,它停止并开始给我 None 甚至它与 selenium 示例中的命令相同。

The data is loaded dynamically within a <script> tag so Beautifulsoup doesn't see it (it doesn't render Javascript).数据在<script>标签内动态加载,因此 Beautifulsoup 看不到它(它不呈现 Javascript)。

As an example, to get the data, you can use:例如,要获取数据,您可以使用:

import json
import requests
from bs4 import BeautifulSoup


url = "https://warframe.market/items/ivara_prime_blueprint"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

script_tag = soup.select_one("#application-state")

json_data = json.loads(script_tag.string)
# Uncomment the line below to see all the data
# from pprint import pprint
# pprint(json_data)

for data in json_data["payload"]["orders"]:
    print(data["user"]["ingame_name"])

Prints:印刷:

Rogue_Monarch
Rappei
KentKoes
Tenno61189
spinifer14
Andyfr0nt
hollowberzinho

You can access the data as a dict and acess the keys / values .您可以将数据作为dict访问并访问keys / values

I'd recommend an online tool to view all the JSON since it's quite large.我推荐一个在线工具来查看所有 JSON,因为它非常大。

See also也可以看看

Parsing out specific values from JSON object in BeautifulSoup 从BeautifulSoup中的JSON object中解析出具体值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM