繁体   English   中英

访问html元素BeautifulSoup Python2.7

[英]accessing html elements BeautifulSoup Python2.7

我无法在列表中获取所有此html的href类属性值。 我不确定自己在做什么错,我什至无法访问参考。

以下是我要解析的内容的摘要:

 <!-- <div class="container">
    <div class="row">
        <div class="col-xs-12 col-md-offset-2 col-md-8 col-md-offset-2">
            <div id='location_list'><h2>Browse by location</h2><ol class='suburb_locations'><div class="row"><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/nsw/abbotsford-nsw">abbotsford, NSW</a><br><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/abbotsford-vic">abbotsford, VIC</a><br><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span>

我正在尝试获得一个key = place的字典,而value =一个列表中的活跃借款人数量。 以及包含href值的列表。 我最大的问题是我无法访问任何这些兄弟姐妹。 我已经尝试了很多事情,下面是我一直在尝试的一些代码的列表:

    from bs4 import Beautiful Soup   
    soup=BeautifulSoup(html,"html5lib")
    print soup.find_all('br')
    print soup.find_all('div h2 ol li')
    print soup.find('li',{'class':"col-sm-3"})

问题是<!-- ,如果您打印汤,您将看到那里什么都没有,当您删除汤时,您会得到html。

In [2]: from bs4 import BeautifulSoup

In [3]: soup = BeautifulSoup(html,"lxml")

In [4]: print(soup)


In [5]: soup = BeautifulSoup(html.replace("<!--",""),"lxml")

In [6]: print(soup)
<html><body><div class="container">
<div class="row">
<div class="col-xs-12 col-md-offset-2 col-md-8 col-md-offset-2">
<div id="location_list"><h2>Browse by location</h2><ol class="suburb_locations"><div class="row"><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/nsw/abbotsford-nsw">abbotsford, NSW</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/abbotsford-vic">abbotsford, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li></div></ol></div></div></div></div></body></html>
In [6]: soup.select(".col-sm-3")
Out[6]: 
[<li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/nsw/abbotsford-nsw">abbotsford, NSW</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li>,
 <li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/abbotsford-vic">abbotsford, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li>]

In [7]: soup.select(".col-sm-3")[0].text
Out[7]: u'abbotsford, NSW0  active owners0  active borrowers'

我不确定从哪里获得html,但是如果要解析它,则需要清理它。

conatiner已被完全注释掉,但是您只需要替换开头的<!--即可,然后在源代码中可以添加以下部分:

import requests

r = requests.get("http://www.carnextdoor.com.au/find-a-car/")

from bs4 import BeautifulSoup

soup = BeautifulSoup(r.content.replace("<!--",""))

print(soup.select("div #location_list"))

这给你:

[<div id="location_list"><h2>Browse by location</h2><ol class="suburb_locations"><div class="row"><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/nsw/abbotsford-nsw">abbotsford, NSW</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/abbotsford-vic">abbotsford, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">0  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/aberfeldie">aberfeldie, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/sa/adelaide">adelaide, SA</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li></div><div class="row"><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/act/ainslie">ainslie, ACT</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/aireys-inlet">aireys inlet, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/airly">airly, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/airport-west">airport west, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li></div><div class="row"><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/vic/albert-park">albert park, VIC</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">5  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/sa/aldgate">aldgate, SA</a><br/><span class="sub_title">1  active owner</span><span class="sub_title">2  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/nsw/alexandria">alexandria, NSW</a><br/><span class="sub_title">1  active owner</span><span class="sub_title">53  active borrowers</span></li><li class="col-sm-3"><a href="http://www.carnextdoor.com.au/car-rental/nsw/alexandria-mc">alexandria mc, NSW</a><br/><span class="sub_title">0  active owners</span><span class="sub_title">1  active borrower</span></li></div><div class="row"><li class="col-sm-3"><a href="http://www.carnextdoor.com

还有更多,基本上是该评论部分中所有感兴趣的内容。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM