美丽的汤和Python，嵌套元素

Question

I am attempting to scrape nested elements via BeautifulSoup and I have been pulling my hair out for a couple days now. 我试图通过BeautifulSoup刮掉嵌套的元素，我已经把头发拉了几天了。 I am, by far, a novice -- so I hope the simplicity of this question does not offend anyone. 到目前为止，我是一名新手 - 所以我希望这个问题的简单性不会冒犯任何人。 Still, any help in any capacity would be greatly appreciated. 尽管如此，任何能力的任何帮助将不胜感激。

Here is the html I'm attempting to scrape. 这是我试图刮的html。

        <div id="specs" class="pane">
           <div class="col">
              <ul class="list">
                 <li>
                    <ul>
                       <li><b>width</b>2</li>
                       <li><b>length</b>1</li>
                       <li><b>color</b>blue</li>
                       <li><b>metal</b>steel</li>
                    </ul>
                 </li>
              </ul>
           </div>
        </div>

And in a perfect world, here is my result... 在一个完美的世界里，这是我的结果......

width, 2
length, 1
color, blue
metal, steel

While I've come close, I know now this can't be the answer...yet, at the same time, I can't seem to loop through the li elements. 虽然我已经接近了，但我知道现在这不是答案......然而，与此同时，我似乎无法遍历li元素。

div = div.find("div", {"id":"specifications"})
result = [i for i in div.find('li')]

If anyone can just push aa beginner in the right direction, it would be greatly greatly appreciation, and thank you in advance for any insight! 如果有人能够把初学者推向正确的方向，那将非常感激，并提前感谢您的任何见解！

Answer 1

You can use CSS selector via select() to find the target b elements, for example : 您可以通过select()使用CSS选择器来查找目标b元素，例如：

from bs4 import BeautifulSoup
raw = '''<div id="specs" class="pane">
           <div class="col">
              <ul class="list">
                 <li>
                    <ul>
                       <li><b>width</b>2</li>
                       <li><b>length</b>1</li>
                       <li><b>color</b>blue</li>
                       <li><b>metal</b>steel</li>
                    </ul>
                 </li>
              </ul>
           </div>
        </div>'''
soup = BeautifulSoup(raw, "lxml")

result = soup.select("div#specs b")    
for r in result:
    print r.get_text(), r.next_sibling

output : 输出：

width 2
length 1
color blue
metal steel

The following is a pure lxml.html alternative for comparison (since OP seems interested in lxml , judging from his comment below). 以下是用于比较的纯lxml.html替代方案（因为OP似乎对lxml感兴趣，从下面的评论来看）。 The output is exactly the same as BS snippet above. 输出与上面的BS片段完全相同。

from lxml import html
raw = '''assume the same XML as in the previous snippet'''
root = html.fromstring(raw)

result = root.cssselect("div#specs b")
for b in result:
    print b.text, b.tail

lxml supports both XPath (via xpath() ) and CSS selector (via cssselect() ), and lxml is fast . lxml支持XPath（通过xpath() ）和CSS选择器（通过cssselect() ）， lxml很快。

美丽的汤和Python，嵌套元素

问题描述

1 个解决方案

解决方案1
0 2016-04-24 01:52:00

美丽的汤和Python，嵌套元素

问题描述

1 个解决方案

解决方案1 0 2016-04-24 01:52:00

解决方案1
0 2016-04-24 01:52:00