使用 BeautifulSoup 通過迭代檢索屬性值

Question

我正在使用以下代碼抓取保存在文件中的 html：

from bs4 import BeautifulSoup as bs

path_xml = r"..."

content = []

with open(path_xml, "r") as file:
    content = file.readlines()

content = "".join(content)
bs_content = bs(content, "html.parser")

bilder = bs_content.find_all("bilder")

def get_str_bild(match):
    test = match.findChildren("b")

    for x in range(len(test)): # here is the problem (not giving me all elements in test)
 
        return test[x].get("d")

for b in bilder:
    if b.b: 
        print(get_str_bild(b))

Output：

L3357U00_002120.jpg
L3357U00_002140.jpg
L3357U00_002160.jpg

基本上，在 xml 文件中有 3 個位置，我有節點“ bilder ”的子節點。 每個塊看起來像這樣：

<Bilder>
    <B Nr="1" D="L3357U00_002120.jpg"/>
    <B Nr="2" D="L3357U00_002120.jpg"/>
    <B Nr="3" D="L3357U00_002120.jpg"/>
    <B Nr="4" D="L3357U00_002120.jpg"/>
    <B Nr="9" D="L3357U00_002120.jpg"/>
    <B Nr="1" D="L3357U00_002130.jpg"/>
    <B Nr="2" D="L3357U00_002130.jpg"/>
    <B Nr="3" D="L3357U00_002130.jpg"/>
    <B Nr="4" D="L3357U00_002130.jpg"/>
    <B Nr="9" D="L3357U00_002130.jpg"/>
</Bilder>

目前它只返回每個塊的第一張圖片，我想返回所有這些圖片。

我在這里做錯了什么？

Answer 1

您需要修復get_str_bild(match) function。它當前返回第一個d屬性。

將您的 function 替換為：

def get_str_bild(match):
    test = match.find_all("b")
    
    elements = []
    for x in range(len(test)):
        elements.append(test[x].get("d"))

    return elements

Answer 2

您錯過了 bs of your biders 的循環。 您可以刪除 function 並簡化代碼，如下所示：

pic_1 = "L3357U00_002120.jpg"

bs_content = bs(content, "html.parser")
for i, builder in enumerate(bs_content.find_all("bilder")):
    print(f'builder {i}')
    for b in bilder.find_all('b'):
        if b['nr'] == pic_1:
            print(b['d'])
            #break

使用 BeautifulSoup 通過迭代檢索屬性值

問題描述

2 個解決方案

解決方案1
0 2023-01-28 13:54:52

解決方案2
0 已采納 2023-01-28 14:04:43

使用 BeautifulSoup 通過迭代檢索屬性值

問題描述

2 個解決方案

解決方案1 0 2023-01-28 13:54:52

解決方案2 0 已采納 2023-01-28 14:04:43

解決方案1
0 2023-01-28 13:54:52

解決方案2
0 已采納 2023-01-28 14:04:43