簡體   English   中英

如何使用 Python 從 HTML 段落中提取描述

[英]how to extract description from HTML paragraph using Python

我想從 HTML 源中提取 HTML 段落。 但它正在獲取顏色和 id 的數據。

import requests
from bs4 import BeautifulSoup

url = "https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003"

response = requests.get(url)

soup = BeautifulSoup(response.text, 'lxml')

description = soup.find(
    'div', {'class': 'description-preview body-2 css-1pbvugb'}).text
print(description)

只需在它之后使用.find p。

description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find("p").text

看來您想要下一個<p>的文本:

description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find_next('p').text

如果這是鏈接中的唯一目標,那么在這種情況下您不需要使用真正的解析器,因為這將加載cache memory 中的所有內容。

您可以使用regexbs4解析器比較操作時間。

下面是一個快速捕獲:

import re
import requests

r = requests.get(
    'https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003')

match = re.search(r'descriptionPreview\":\"(.+?)\"', r.text).group(1)
print(match)

Output:

Designed with every woman in mind, the mixed material upper of the Nike Air Max Viva 
features a plush collar, detailed patterning and intricate stitching. The new lacing 
system uses 2 separate laces constructed from heavy-duty tech chord, letting you find the perfect fit. Mixing comfort with style, it combines Nike Air with a lifted foam 
heel for and unbelievable ride that looks as good as it feels.

如果您想使用bs4

這是一個簡短的用法:

soup = BeautifulSoup(r.text, 'lxml')
print(soup.select_one('.description-preview').p.string)

注意:我使用lxml解析器,因為它是根據bs4-documentation最快的解析器

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM