Python BeautifulSoup 從 SPAN 和 A 標簽中提取文本

Question

我想從 SPAN 和 A 標簽中提取文本並放入如下模式的列表中：

['Farina', '500 g']['Uova', '1']['Sale', '100 g']

用 BeautifulSoup 抓取

from bs4 import BeautifulSoup
import re
import string

markup = """
<dd class="ingredient">
    <a href="#">Farina</a>
    <span>500 g</span>
</dd>
<dd class="ingredient">
    <a href="#">Uova</a>
    <span>1</span>
</dd>
<dd class="ingredient">
    <a href="#">Sale</a>
    <span>100 g</span>
</dd>
"""

soup = BeautifulSoup(markup, 'html.parser')

allIngredients = []
for tag in soup.find_all(attrs={'class' : 'ingredient'}):
    #[tag.text for tag in tags]
    link = tag.a.get('href')
    nameIngredient = tag.a.string

    contents = tag.span.text
    quantityIngredient = re.sub(r"\s+", " ", contents).strip()
    allIngredients.append([nameIngredient, quantityIngredient])

print(allIngredients)

有時 SPAN 可以為空或不存在

Answer 1

這是使用lxml （而不是bs4 ）的解決方案

from lxml import html

markup = """
<dd class="ingredient">
    <a href="#">Farina</a>
    <span>500 g</span>
</dd>
<dd class="ingredient">
    <a href="#">Uova</a>
    <span>1</span>
</dd>
<dd class="ingredient">
    <a href="#">Sale</a>
    <span>100 g</span>
</dd>
<dd class="ingredient">
    <a href="#">Vino</a>
</dd>
"""

root = html.fromstring(markup)
result = []
for node in root.xpath(".//dd"):
    a = node.xpath(".//a")
    span = node.xpath(".//span")
    result.append((
        a[0].text_content() if a else None, 
        span[0].text_content() if span else None
    ))


print(result)
# [('Farina', '500 g'), ('Uova', '1'), ('Sale', '100 g'), ('Vino', None)]

Python BeautifulSoup 從 SPAN 和 A 標簽中提取文本

問題描述

1 個解決方案

解決方案1
0 已采納 2021-09-28 16:31:53

Python BeautifulSoup 從 SPAN 和 A 標簽中提取文本

問題描述

1 個解決方案

解決方案1 0 已采納 2021-09-28 16:31:53

解決方案1
0 已采納 2021-09-28 16:31:53