我尝试了很多次从预订中获取数据。com。但我不能

Question

我想从 booking.com 中抓取数据，但出现了一些错误，找不到任何类似的代码。 我想刮酒店的名称，价格等。

我已经尝试过 beautifulSoup 4 并尝试将数据获取到 csv 文件。

import requests
from bs4 import BeautifulSoup
import pandas

# Replace search_url with a valid one byb visiting and searching booking.com
search_url = 'https://www.booking.com/searchresults.....'
page = requests.get(search_url)
soup = BeautifulSoup(page.content, 'html.parser')

week = soup.find(id = 'search_results_table'  )
#print(week)

items = week.find_all(class_='sr-hotel__name')
print(items[0])
print(items[0].find(class_ = 'sr-hotel__name').get_text())
print(items[0].find(class_ = 'short-desc').get_text())

这是一个示例 URL 可以用来代替search_url 。

这是错误消息...

<span class="sr-hotel__name " data-et-click="
">
The Fort Printers
</span>
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-44-77b38c8546bb> in <module>
     11 items = week.find_all(class_='sr-hotel__name')
     12 print(items[0])
---> 13 print(items[0].find(class_ = 'sr-hotel__name').get_text())
     14 print(items[0].find(class_ = 'short-desc').get_text())
     15 

AttributeError: 'NoneType' object has no attribute 'get_text'

Answer 1

首先，伙计，使用请求可能真的很难，因为您必须完全模仿浏览器将发送的请求。 您必须使用一些嗅探工具（burp、fiddler、wireshark），或者在某些情况下，在浏览器上以开发人员模式查看网络，这相对困难......

我建议你使用“selenium”，它是一个 web 驱动程序，可以让你在尝试抓取网站时变得轻松......在这里阅读更多相关信息 - https://medium.com/the-andela-way/introduction- to-web-scraping-using-selenium-7ec377a8cf72

至于你的错误，我认为你应该使用 only.text 而不是 .get_text()

Answer 2

如果您考虑直接使用 getText() 方法，而不是多次使用find()方法，它会有所帮助。

import requests
from bs4 import BeautifulSoup
import pandas

# Replace search_url with a valid one byb visiting and searching booking.com
search_url = 'https://www.booking.com/searchresults.....'
page = requests.get(search_url)
soup = BeautifulSoup(page.content, 'html.parser')

week = soup.find(id = 'search_results_table'  )
#print(week)

items = week.find_all(class_='sr-hotel__name')
# print the whole thing
print(items[0])
hotel_name = items[0].getText()

# print hotel name
print(hotel_name)

# print without newlines
print(hotel_name[1:-1])

希望这可以帮助。 我建议阅读更多 BeautifulSoup 文档。

我尝试了很多次从预订中获取数据。com。但我不能

问题描述

2 个解决方案

解决方案1
0 2019-10-06 10:02:17

解决方案2
0 已采纳 2019-10-06 10:21:52

我尝试了很多次从预订中获取数据。com。但我不能

问题描述

2 个解决方案

解决方案1 0 2019-10-06 10:02:17

解决方案2 0 已采纳 2019-10-06 10:21:52

解决方案1
0 2019-10-06 10:02:17

解决方案2
0 已采纳 2019-10-06 10:21:52