![](/img/trans.png)
[英]How to get URL from two dropdown lists (webscraping with python)
[英]How to get two element from two dropdown lists (webscraping with python)
我想抓取這個網頁( www.autocar.co.uk )。 因此,我想 select 每個 Automaker 在下拉菜單中和 model。 我總是想跳過“所有型號”選項。
當我剛開始編碼時,我非常感謝您的意見
所需的 output:
Auto OEM (e.g, Tesla)
All Models of Tesla (e.g. Model 3, Y...)
Example:
Abarth
595
595 Competi...
124 Spider...
695 Bopisto...
AC Cars
#Skip due to no model
我現在的代碼:
from bs4 import BeautifulSoup
import requests
#Inputs/URLs to scrape:
URL = ('https://www.autocar.co.uk/car-review/tesla')
(response := requests.get(URL)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()
oems = soup.select('select.car-finder-make-chooser option')
for oem_loop in oems[1:]:
print(oem_loop.text)
models = soup.select('select.car-finder-model-chooser option')
for model_loop in models:
print(model_loop)
我的 output 現在:
Abarth
<option value="0">Model</option>
AC Cars
<option value="0">Model</option>
AC Schnitzer
<option value="0">Model</option>
Aiways
<option value="0">Model</option>
Allard
<option value="0">Model</option>
Alfa Romeo
<option value="0">Model</option>
Alpina
<option value="0">Model</option>
Alpine
<option value="0">Model</option>
Ariel
<option value="0">Model</option>
Ascari
<option value="0">Model</option>
Aston Martin
<option value="0">Model</option>
Audi
<option value="0">Model</option>
BAC
<option value="0">Model</option>
Bentley
<option value="0">Model</option>
Bizzarrini
<option value="0">Model</option>
BMW
<option value="0">Model</option>
Borgward
<option value="0">Model</option>
Bowler
<option value="0">Model</option>
Bugatti
<option value="0">Model</option>
BYD
<option value="0">Model</option>
...
以下代碼將為您提供品牌、型號及其對應網址的列表,並將保存包含詳細信息的 csv 文件:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "http://www.autocar.co.uk/"
s = requests.Session()
r = s.get(url)
soup = BeautifulSoup(r.text,'html.parser')
full_car_list = []
car_list = [(x.text, x.get("value"), f'https://www.autocar.co.uk/ajax/car-models/{x.get("value")}/0') for x in soup.select_one('#edit-make').select('option')]
for x in car_list:
r = s.get(x[2])
try:
for item in r.json()['options'].items():
full_car_list.append((x[0], item[1], f'https://www.autocar.co.uk{item[0]}'))
except Exception as e:
full_car_list.append((x[0], 'no models', f'https://www.autocar.co.uk/vehicles/{x[0]}'))
cars_df = pd.DataFrame(full_car_list[1:], columns = ['Make', 'Model', 'Url'])
cars_df = cars_df[cars_df.Model != 'All models']
cars_df.to_csv('makes_models.csv')
print(cars_df.head(15))
這將返回一個 csv 文件,並將打印 dataframe 的頭部:
Make Model Url
1 Abarth 595 https://www.autocar.co.uk/car-review/abarth/595
2 Abarth 595 Competizione https://www.autocar.co.uk/car-review/abarth/595-competizione
3 Abarth 124 Spider 2016-2019 https://www.autocar.co.uk/car-review/abarth/124-spider-2016-2019
4 Abarth 695 Biposto 2015-2016 https://www.autocar.co.uk/car-review/abarth/695-biposto-2015-2016
5 AC Cars no models https://www.autocar.co.uk/vehicles/AC Cars
7 AC Schnitzer ACS3 Sport https://www.autocar.co.uk/car-review/ac-schnitzer/acs3-sport
8 AC Schnitzer ACS1 https://www.autocar.co.uk/car-review/ac-schnitzer/acs1
9 AC Schnitzer ACS5 Sport https://www.autocar.co.uk/car-review/ac-schnitzer/acs5-sport
10 Aiways no models https://www.autocar.co.uk/vehicles/Aiways
12 Allard J2X MkII https://www.autocar.co.uk/car-review/allard/j2x-mkii
[...]
我建議查看 bs4 的文檔: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
也適用於 pandas: https://pandas.pydata.org/docs/
還有對於 python 請求: https://requests.readthedocs.io/en/latest/
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.