i'm trying to scrape the webpage https://www.cars.com/dealers/5374692/carvana-touchless-delivery-to-your-home/
in this page there's a button to See All Vehicles and i'm tring to get the href for that tag.
so far i've made this work using selenium but opening a webdriver everytime takes too much time. i don't want to try selenium
while BeautifulSoup is showing nonetype error. my code is
import requests
from bs4 import BeautifulSoup
import re
base_url = 'https://www.cars.com/'
def request_page(url):
session = requests.Session()
my_headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"}
response = session.get(url, headers=my_headers)
soup = BeautifulSoup(re.sub("<!---->","", response.text), "lxml")
return soup
def dealers_subpage(url):
try:
soup = request_page(url)
descript = soup.find('dpp-update-inventory-link')
print(descript.prettify())
link = descript.find('a')['href']
return base_url+str(link)
except Exception as e:
print(e,url)
dealers_subpage('https://www.cars.com/dealers/5374692/carvana-touchless-delivery-to-your-home/')
for this code i'm getting this message.
<dpp-update-inventory-link new-count="" party-id="74424458" used-count="100" zipcode="11763">
</dpp-update-inventory-link>
'NoneType' object is not subscriptable https://www.cars.com/dealers/5374692/carvana-touchless-delivery-to-your-home/
my question is why is it not reading the a tag which is present there.
note- use incognito/private mode to visit the webpage as in normal window it redirects to some other page
page is loading dynamic so you can not get a
tag in dpp-update-inventory-link
, even when you are printing descript.prettify()
a
is not present there so mean it rendering dynamically you have to use selenium
.
just for currrent requiement for link, you can generate that link by your self because src
for that link is using attribute of descript
like party-id
& zipcode
so
def dealers_subpage(url):
soup = request_page(url)
descript = soup.find('dpp-update-inventory-link')
party_id = descript['party-id']
zipcode = descript['zipcode']
url = f"{base_url}/for-sale/searchresults.action/?dlId={party_id}&zc={zipcode}&searchSource=CAPTIVE_BLENDED"
return url
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.