如何使用 Python 從 HTML 段落中提取描述

Question

我想從 HTML 源中提取 HTML 段落。 但它正在獲取顏色和 id 的數據。

import requests
from bs4 import BeautifulSoup

url = "https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003"

response = requests.get(url)

soup = BeautifulSoup(response.text, 'lxml')

description = soup.find(
    'div', {'class': 'description-preview body-2 css-1pbvugb'}).text
print(description)

Answer 1

只需在它之后使用.find p。

description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find("p").text

Answer 2

看來您想要下一個<p>的文本：

description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find_next('p').text

Answer 3

如果這是鏈接中的唯一目標，那么在這種情況下您不需要使用真正的解析器，因為這將加載cache memory 中的所有內容。

您可以使用regex或bs4解析器比較操作時間。

下面是一個快速捕獲：

import re
import requests

r = requests.get(
    'https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003')

match = re.search(r'descriptionPreview\":\"(.+?)\"', r.text).group(1)
print(match)

Output：

Designed with every woman in mind, the mixed material upper of the Nike Air Max Viva 
features a plush collar, detailed patterning and intricate stitching. The new lacing 
system uses 2 separate laces constructed from heavy-duty tech chord, letting you find the perfect fit. Mixing comfort with style, it combines Nike Air with a lifted foam 
heel for and unbelievable ride that looks as good as it feels.

如果您想使用bs4 ：

這是一個簡短的用法：

soup = BeautifulSoup(r.text, 'lxml')
print(soup.select_one('.description-preview').p.string)

注意：我使用lxml解析器，因為它是根據bs4-documentation最快的解析器

如何使用 Python 從 HTML 段落中提取描述

問題描述

3 個解決方案

解決方案1
1 2021-03-02 06:42:48

解決方案2
1 2021-03-02 06:45:10

解決方案3
1 2021-03-02 06:46:31

如何使用 Python 從 HTML 段落中提取描述

問題描述

3 個解決方案

解決方案1 1 2021-03-02 06:42:48

解決方案2 1 2021-03-02 06:45:10

解決方案3 1 2021-03-02 06:46:31

解決方案1
1 2021-03-02 06:42:48

解決方案2
1 2021-03-02 06:45:10

解決方案3
1 2021-03-02 06:46:31