简体   繁体   English

CSS 子选择器(无法选择所有子项)

[英]CSS children selector (not being able to select all children)

图片

This is the image of what I'm trying to scrape using beautiful soup.这是我试图用美丽的汤刮的图像。 But whenever I use the code shown below, I only get access to the first child.但是每当我使用下面显示的代码时,我只能访问第一个孩子。 I am never able to get access to all the children.我永远无法接触到所有的孩子。 Can someone help me with this?有人可以帮我弄这个吗?

item = soup.select("ul.items > li")
print(len(item))

The problem can be fixed in 2 steps as follows:该问题可以通过以下 2 步解决:

  1. Use select_one on soup to get the ul上使用select_one来获取ul
  2. Use find_all on ul to fetch all the li items.ul上使用find_all来获取所有li项目。

Working solution:工作解决方案:

# File name: soup-demo.py

inputHTML = """
<ul class="items">
<li class="class1">item 1</li>
<li class="class1">item 3</li>
<li class="class1">item 3</li>
</ul>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(inputHTML, 'html.parser')
itemList = soup.select_one("ul", class_="items")

items = itemList.find_all("li")
print("Found ", len(items), " items")
for item in items:
    print(item)

Output:输出:

$ python3 soup-demo.py 
Found  3  items
<li class="class1">item 1</li>
<li class="class1">item 3</li>
<li class="class1">item 3</li>

Maybe your version is wrong.可能你的版本不对。 This is OK.还行吧。

from bs4 import BeautifulSoup
html = '''
<ul class="items">
  <li>1</li>
  <li>2</li>
</ul>
'''
soup = BeautifulSoup(html,features="lxml")
item = soup.select('ul.items>li')
print (len(item))

There's another solution here这里有另一个解决方案

from simplified_scrapy.simplified_doc import SimplifiedDoc
html = '''
<ul class="items">
  <li>1</li>
  <li>2</li>
</ul>
'''
doc = SimplifiedDoc(html)
item = doc.selects('ul.items>li')
print(len(item))

Here are more examples here下面是更多的例子在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM