簡體   English   中英

使用 Python beautifulSoup 抓取特定的 dd 項

[英]Scraping specific dd item using Python beautifulSoup

我正在嘗試使用 Python 從網站中提取特定的“dd”元素

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
           'AppleWebKit/537.36 (KHTML, like Gecko) '\
           'Chrome/75.0.3770.80 Safari/537.36'}

url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')


vehicle=[]

for i in soup.findAll("div", class_="message-userExtras"):
    for item in soup.find_all("dd")[::-1]:
        vehicle.append(item.get_text())
print(vehicle)

我正在嘗試從 url 中僅提取車輛列表,我的輸出應如下所示

2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain

但我的結果並不是我所期望的

使用正則表達式 re 並搜索帶有文本Vehicledt標簽,然后找到下一個dd標簽。

import re
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
           'AppleWebKit/537.36 (KHTML, like Gecko) '\
           'Chrome/75.0.3770.80 Safari/537.36'}

url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

for item in soup.find_all("div",class_='message-userExtras'):
    print(item.find('dt',text=re.compile("Vehicle")).find_next('dd').text.strip())

輸出:

2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat - 2019 Honda CRV Touring
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat SuperCab
2019 Ranger Lariat
Ranger Lariat
2019 Ford Ranger Lariat
Ranger Lariat
Ranger Lariat
2019 Ranger XLT 301A SuperCrew 4X4 2015 Ecoboost Mustang 50 Year Appereance Package convertible

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM