简体   繁体   English

使用BeautifulSoup迭代html树中的元素,并生成一个保持每个元素相对位置的输出?在Python中

[英]Iterate through elements in html tree using BeautifulSoup, and produce an output that maintains the relative position of each element? in Python

I have this code that does what I need it to do using Jsoup in Java 我有这个代码,它使用Java中的Jsoup来完成我需要它做的事情

Elements htmlTree = doc.body().select("*");

    Elements menuElements = new Elements();

    for(Element element : htmlTree) {

        if(element.hasClass("header")) 
            menuElements.add(element);
        if(element.hasClass("name"))
            menuElements.add(element);
        if(element.hasClass("quantity"))
            menuElements.add(element);
    }

I want to do the same thing but in Python using BeautifulSoup. 我想做同样的事情,但在Python中使用BeautifulSoup。 An example tree of the HTML I'm trying to scrape follows: 我试图抓取的HTML示例树如下:

<div class="header"> content </div>
     <div class="name"> content </div>
     <div class="quantity"> content </div>
     <div class="name"> content </div>
     <div class="quantity"> content </div>
<div class="header"> content2 </div>
     <div class="name"> content2 </div>
     <div class="quantity"> content2 </div>
     <div class="name"> content2 </div>
     <div class="quantity"> content2 </div>

etc. 等等

Basically I want the output to preserve the relative positions of each element. 基本上我希望输出保留每个元素的相对位置。 How would I got about doing that using Python and BeautifulSoup? 我将如何使用Python和BeautifulSoup做到这一点?

EDIT: 编辑:

This is the python code I have (it's very naive) but maybe it can help? 这是我的python代码(它非常天真),但也许它可以帮助?

output = []

for e in soup :
  if e["class"] == "pickmenucolmenucat" :
    output.append(e)
  if e["class"] == "pickmenucoldispname" :
    output.append(e)
  if e["class"] == "pickmenucolportions" :
    output.append(e)

To find all <div> elements that have class attribute from a given list: 要查找具有给定列表中的class属性的所有<div>元素:

#!/usr/bin/env python
from bs4 import BeautifulSoup # $ pip install beautifulsoup4

with open('input.xml', 'rb') as file:
    soup = BeautifulSoup(file)

elements = soup.find_all("div", class_="header name quantity".split())
print("\n".join("{} {}".format(el['class'], el.get_text()) for el in elements))

Output 产量

['header']  content 
['name']  content 
['quantity']  content 
['name']  content 
['quantity']  content 
['header']  content2 
['name']  content2 
['quantity']  content2 
['name']  content2 
['quantity']  content2 

There are also other methods that allows you to search, traverse html elements . 还有其他方法可以让你搜索,遍历html元素

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否可以在BeautifulSoup4中遍历HTML树? - Is it possible to iterate through HTML-tree in BeautifulSoup4? XML元素树Python遍历子级并将每个子级另存为CSV列 - XML Element Tree Python Iterate through child and save each subchild as CSV column 在python beautifulsoup中遍历多个div,输出到df然后csv - Iterate through multiple divs in python beautifulsoup, output to df then csv pandas 使用相对 position 遍历行(YTD 计算示例) - pandas iterate through rows using relative position (YTD calculation example) Python - 使用 BeautifulSoup 遍历页面 - Python - Iterate through pages with BeautifulSoup 如何使用ElementTree for Python遍历所有XML元素并将逻辑应用于每个Element的值 - How to iterate through all XML Elements and apply logic to each Element's value with ElementTree for Python 在python中使用Selenium遍历所有元素 - Iterate through all elements using selenium in python 如何在 Python 中使用 for 遍历 json 元素 - How to iterate through json elements using for in Python 如何使用 Z23EEEB4347BDD26BFC6B7EE9A37B 中的 selenium 获取相对于 canvas 元素的元素的 position? - How to get the position of an element relative to canvas element using selenium in python? 如何遍历 python 中二维数组中的每个元素? - How to iterate through each element in a 2D array in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM