简体   繁体   English

美丽的汤得到第一个孩子

[英]Beautiful soup getting the first child

How can I get the first child?我怎样才能得到第一个孩子?

 <div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>

How can I get London?我怎样才能到达伦敦?

for div in nsoup.find_all(class_='cities'):
    print (div.children.contents)

AttributeError: 'listiterator' object has no attribute 'contents' AttributeError: 'listiterator' 对象没有属性 'contents'

div.children returns an iterator. div.children 返回一个迭代器。

for div in nsoup.find_all(class_='cities'):
    for childdiv in div.find_all('div'):
        print (childdiv.string) #london, york

AttributeError was raised, because of non-tags like '\\n' are in .children . AttributeError 被引发,因为像'\\n'这样的非标签在.children just use proper child selector to find the specific div.只需使用适当的子选择器来查找特定的 div。

(more edit) can't reproduce your exceptions - here's what I've done: (更多编辑)无法重现您的异常 - 这是我所做的:

In [137]: print foo.prettify()
<div class="cities">
 <div id="3232">
  London
 </div>
 <div id="131">
  York
 </div>
</div>

In [138]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string
   .....: 
 London 
 York 

In [139]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string, childdiv['id']
   .....: 
 London  3232
 York  131

With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector.使用现代版本的 bs4(当然是 bs4 4.7.1+),您可以访问 :first-child css 伪选择器。 Nice and descriptive.很好,很有描述性。 Use soup.select_one if you only want to return the first match ie soup.select_one('.cities div:first-child').text .如果您只想返回第一个匹配项,即soup.select_one('.cities div:first-child').text请使用soup.select_one It is considered good practice to test is not None before using .text accessor.在使用.text访问器之前测试 is not None被认为是一种很好的做法。

from bs4 import BeautifulSoup as bs

html = '''
<div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>
  '''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)

The current accepted answer gets all cities, when the question only wanted the first.当前接受的答案是所有城市,当问题只需要第一个时。

If you only need the first child, you can take advantage of .children returning an iterator and not a list.如果您只需要第一个孩子,您可以利用.children返回迭代器而不是列表。 Remember that an iterator generates list items on the fly, and because we only need the first element of the iterator, we don't ever need to generate all other city elements (thus saving time).请记住,迭代器会即时生成列表项,因为我们只需要迭代器的第一个元素,所以我们不需要生成所有其他城市元素(从而节省时间)。

for div in nsoup.find_all(class_='cities'):
    first_child = next(div.children, None)
    if first_child is not None:
        print(first_child.string.strip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM