简体   繁体   中英

Parsing text with bs4 works with selenium but does not work with requests in Python

This code works and returns the single digit number that i want but its so slow and takes good 10 seconds to complete.I will be running this 4 times for my use so thats 40 seconds wasted every run. ` from selenium import webdriver from bs4 import BeautifulSoup

options = webdriver.FirefoxOptions()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)

driver.get('https://warframe.market/items/ivara_prime_blueprint')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

price_element = soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))

driver.close()`

This code on the other hand does not work. It returns None. ` import requests from bs4 import BeautifulSoup

url='https://warframe.market/items/ivara_prime_blueprint'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

price_element=soup.find('div', {'class': 'row order-row--Alcph'})
price2=price_element.find('div',{'class':'order-row__price--hn3HU'})

price = price2.text

print(int(price))`

First thought was to add user agent but still did not work. When I print(soup) it gives me html code but when i parse it further it stops and starts giving me None even tho its the same command like in selenium example.

The data is loaded dynamically within a <script> tag so Beautifulsoup doesn't see it (it doesn't render Javascript).

As an example, to get the data, you can use:

import json
import requests
from bs4 import BeautifulSoup


url = "https://warframe.market/items/ivara_prime_blueprint"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

script_tag = soup.select_one("#application-state")

json_data = json.loads(script_tag.string)
# Uncomment the line below to see all the data
# from pprint import pprint
# pprint(json_data)

for data in json_data["payload"]["orders"]:
    print(data["user"]["ingame_name"])

Prints:

Rogue_Monarch
Rappei
KentKoes
Tenno61189
spinifer14
Andyfr0nt
hollowberzinho

You can access the data as a dict and acess the keys / values .

I'd recommend an online tool to view all the JSON since it's quite large.

See also

Parsing out specific values from JSON object in BeautifulSoup

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM