繁体   English   中英

使用Python BeautifulSoup 4进行​​网页爬取

[英]Web Scraping with Python BeautifulSoup 4

我是Webscraping的新手,并在在线观看了一些教程视频后进行了尝试。 我决定使用Tripadvisor.com并尝试从客户评论中收集数据。

这是我想出的(代码):

from urllib.request import urlopen as uReq

from bs4 import BeautifulSoup as soup

my_url = 'https://www.tripadvisor.com.sg/Attraction_Review-g293916-d12033454-
Reviews-SHOW_DC-Bangkok.html'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div",{"class":"ui_column is-9"})

for container in containers:
   rating = container.div.div.div.span["class"]

    comment_container = container.p
    comment = comment_container[0]

    print("rating" + rating)
    print("comment" + comment)

这是我的代码的输出:

Traceback (most recent call last):
  File "trip_advisor.py", line 18, in <module>
    comment = comment_container[0]
  File "/anaconda/lib/python3.6/site-packages/bs4/element.py", line 1011, in 
__getitem__
    return self.attrs[key]
KeyError: 0

谁能帮我解决这个问题? 谢谢。

您将无法通过使用<class bs4.element.Tag'>的索引来访问内容, <class bs4.element.Tag'>需要.contents

>>> container.p
<p class="partial_entry">I was there couple of weeks ago on the weekend. There was an event but it was not very crowded thought and I actually like it. What drawn my attention is the PUB on the roof top. A Pub in a department store sound pretty...<span class="taLnk ulBlueLinks" onclick="ta.prwidgets.call('handlers.clickExpand',event,this);">More</span></p>
>>> container.p.contents[0]
'I was there couple of weeks ago on the weekend. There was an event but it was not very crowded thought and I actually like it. What drawn my attention is the PUB on the roof top. A Pub in a department store sound pretty...'

除了这个问题,我不确定您的rating刮刮功能是否能真正满足您的需求,但这可以解决主要错误:

for container in containers:
    rating = container.div.div.div.span["class"]
    comment_container = container.p.contents
    comment = comment_container[0]
    print("Rating: ", rating)
    print("Comment: " + comment)

打印:

Rating:  ['ui_bubble_rating', 'bubble_40']
Comment: I was there couple of weeks ago on the weekend. There was an event but it was not very crowded thought and I actually like it. What drawn my attention is the PUB on the roof top. A Pub in a department store sound pretty...
Rating:  ['ui_bubble_rating', 'bubble_50']
Comment: Show dc is very fascinating place that you must to go. The mega complex is very special from the others mall in thailand. I think you can touching and feeling of the hapiness. I went there few days ago for find some dining and spending...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM