[英]Get a specific tag - BeautifulSoup
Below is the xml that I'm trying to parse.下面是我要解析的 xml。
<url>
<loc>https://www.houseofindya.com/aqua-chanderi-pleated-sharara-pants-177/iprdt</loc>
<image:image>
<image:loc>https://img.faballey.com/Images/Product/IPL00325Z/d3.jpg</image:loc>
<image:title>Green Chanderi Pleated Sharara Pants</image:title>
</image:image>
<priority>0.8</priority>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://www.houseofindya.com/aqua-foil-chanderi-kurta-171/iprdt</loc>
<image:image>
<image:loc>https://img.faballey.com/Images/Product/ITN01710Z/d3.jpg</image:loc>
<image:title>Aqua Foil Chanderi Kurta</image:title>
</image:image>
<priority>0.8</priority>
<changefreq>daily</changefreq>
</url>
I need to get text of only <loc>
tags.我只需要获取<loc>
标签的文本。 So, I do the following:-因此,我执行以下操作:-
soup = BeautifulSoup(xml, features='xml')
loc = soup.find('loc')
while loc is not None:
url = loc.text
yield url
loc = loc.find_next('loc')
The result I get is我得到的结果是
https://www.houseofindya.com/aqua-chanderi-pleated-sharara-pants-177/iprdt
https://img.faballey.com/Images/Product/IPL00325Z/d3.jpg
https://www.houseofindya.com/aqua-foil-chanderi-kurta-171/iprdt
https://img.faballey.com/Images/Product/ITN01710Z/d3.jpg
However, what I want is only https://www.houseofindya.com/aqua-chanderi-pleated-sharara-pants-177/iprdt
, and https://www.houseofindya.com/aqua-foil-chanderi-kurta-171/iprdt
.但是,我想要的只是https://www.houseofindya.com/aqua-chanderi-pleated-sharara-pants-177/iprdt
和https://www.houseofindya.com/aqua-foil-chanderi-kurta-171/iprdt
。 I don't want the text of <image:loc>
.我不想要<image:loc>
的文本。
What am I missing here?我在这里想念什么?
You can use CSS
selectors even in XML document, so select all url > loc
:您甚至可以在 XML 文档中使用CSS
选择器,因此选择所有url > loc
:
from bs4 import BeautifulSoup
xml_doc = """
... your XML from question here ...
"""
soup = BeautifulSoup(xml_doc, "html.parser")
for loc in soup.select("url > loc"):
print(loc.text)
Prints:印刷:
https://www.houseofindya.com/aqua-chanderi-pleated-sharara-pants-177/iprdt
https://www.houseofindya.com/aqua-foil-chanderi-kurta-171/iprdt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.