[英]Iterate through elements in html tree using BeautifulSoup, and produce an output that maintains the relative position of each element? in Python
I have this code that does what I need it to do using Jsoup in Java 我有这个代码,它使用Java中的Jsoup来完成我需要它做的事情
Elements htmlTree = doc.body().select("*");
Elements menuElements = new Elements();
for(Element element : htmlTree) {
if(element.hasClass("header"))
menuElements.add(element);
if(element.hasClass("name"))
menuElements.add(element);
if(element.hasClass("quantity"))
menuElements.add(element);
}
I want to do the same thing but in Python using BeautifulSoup. 我想做同样的事情,但在Python中使用BeautifulSoup。 An example tree of the HTML I'm trying to scrape follows:
我试图抓取的HTML示例树如下:
<div class="header"> content </div>
<div class="name"> content </div>
<div class="quantity"> content </div>
<div class="name"> content </div>
<div class="quantity"> content </div>
<div class="header"> content2 </div>
<div class="name"> content2 </div>
<div class="quantity"> content2 </div>
<div class="name"> content2 </div>
<div class="quantity"> content2 </div>
etc. 等等
Basically I want the output to preserve the relative positions of each element. 基本上我希望输出保留每个元素的相对位置。 How would I got about doing that using Python and BeautifulSoup?
我将如何使用Python和BeautifulSoup做到这一点?
EDIT: 编辑:
This is the python code I have (it's very naive) but maybe it can help? 这是我的python代码(它非常天真),但也许它可以帮助?
output = []
for e in soup :
if e["class"] == "pickmenucolmenucat" :
output.append(e)
if e["class"] == "pickmenucoldispname" :
output.append(e)
if e["class"] == "pickmenucolportions" :
output.append(e)
To find all <div>
elements that have class
attribute from a given list: 要查找具有给定列表中的
class
属性的所有<div>
元素:
#!/usr/bin/env python
from bs4 import BeautifulSoup # $ pip install beautifulsoup4
with open('input.xml', 'rb') as file:
soup = BeautifulSoup(file)
elements = soup.find_all("div", class_="header name quantity".split())
print("\n".join("{} {}".format(el['class'], el.get_text()) for el in elements))
['header'] content
['name'] content
['quantity'] content
['name'] content
['quantity'] content
['header'] content2
['name'] content2
['quantity'] content2
['name'] content2
['quantity'] content2
There are also other methods that allows you to search, traverse html elements . 还有其他方法可以让你搜索,遍历html元素 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.