简体   繁体   English

如何使用 Python 从 HTML 段落中提取描述

[英]how to extract description from HTML paragraph using Python

I want to extract HTML paragraph from the HTML source.我想从 HTML 源中提取 HTML 段落。 But it's getting data of color and id along with it.但它正在获取颜色和 id 的数据。

import requests
from bs4 import BeautifulSoup

url = "https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003"

response = requests.get(url)

soup = BeautifulSoup(response.text, 'lxml')

description = soup.find(
    'div', {'class': 'description-preview body-2 css-1pbvugb'}).text
print(description)

Just use.find p with after it.只需在它之后使用.find p。

description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find("p").text

It seems you want the text of the next <p> :看来您想要下一个<p>的文本:

description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find_next('p').text

if that's your only target from the link, so you don't need to use a real parser in that case, since that's will loads all the content within cache memory.如果这是链接中的唯一目标,那么在这种情况下您不需要使用真正的解析器,因为这将加载cache memory 中的所有内容。

You can compare the operation time using regex or bs4 parser.您可以使用regexbs4解析器比较操作时间。

below is a quick catch:下面是一个快速捕获:

import re
import requests

r = requests.get(
    'https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003')

match = re.search(r'descriptionPreview\":\"(.+?)\"', r.text).group(1)
print(match)

Output: Output:

Designed with every woman in mind, the mixed material upper of the Nike Air Max Viva 
features a plush collar, detailed patterning and intricate stitching. The new lacing 
system uses 2 separate laces constructed from heavy-duty tech chord, letting you find the perfect fit. Mixing comfort with style, it combines Nike Air with a lifted foam 
heel for and unbelievable ride that looks as good as it feels.

In case if you would like to use bs4 :如果您想使用bs4

Here's a short usage:这是一个简短的用法:

soup = BeautifulSoup(r.text, 'lxml')
print(soup.select_one('.description-preview').p.string)

Note: i used lxml parser as it's the quickest parser according to bs4-documentation注意:我使用lxml解析器,因为它是根据bs4-documentation最快的解析器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM