简体   繁体   English

使用 Python beautifulSoup 抓取特定的 dd 项

[英]Scraping specific dd item using Python beautifulSoup

I am trying to extract specific 'dd' element from the website using Python我正在尝试使用 Python 从网站中提取特定的“dd”元素

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
           'AppleWebKit/537.36 (KHTML, like Gecko) '\
           'Chrome/75.0.3770.80 Safari/537.36'}

url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')


vehicle=[]

for i in soup.findAll("div", class_="message-userExtras"):
    for item in soup.find_all("dd")[::-1]:
        vehicle.append(item.get_text())
print(vehicle)

I am trying to extract only vehicle list from the url and my output should be as follows我正在尝试从 url 中仅提取车辆列表,我的输出应如下所示

2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain

But my result is not what I expect it to be但我的结果并不是我所期望的

Use regular expression re and search the dt tag with text Vehicle and then find the next dd tag.使用正则表达式 re 并搜索带有文本Vehicledt标签,然后找到下一个dd标签。

import re
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
           'AppleWebKit/537.36 (KHTML, like Gecko) '\
           'Chrome/75.0.3770.80 Safari/537.36'}

url = "https://www.ranger5g.com/forum/threads/pre-collision-assist.3239"
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

for item in soup.find_all("div",class_='message-userExtras'):
    print(item.find('dt',text=re.compile("Vehicle")).find_next('dd').text.strip())

Output:输出:

2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
Tahoe/Tundra/Fusion
2019 Ford Ranger Lariat - Saber; 2014 GMC Terrain
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat - 2019 Honda CRV Touring
2019 Ford Ranger XLT FX4
2019 Ford Ranger Lariat FX4, 1973 Mercury Capri
2019 Ranger Lariat SuperCab
2019 Ranger Lariat
Ranger Lariat
2019 Ford Ranger Lariat
Ranger Lariat
Ranger Lariat
2019 Ranger XLT 301A SuperCrew 4X4 2015 Ecoboost Mustang 50 Year Appereance Package convertible

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM