[英]I tried lot of times to grab the data from booking.com.But i couldn't
I want to scrape the data from the booking.com but got some errors and couldn't find any similar codes.我想从 booking.com 中抓取数据,但出现了一些错误,找不到任何类似的代码。 I want to scrape the name of the hotel,price and etc.我想刮酒店的名称,价格等。
i have tried beautifulSoup 4 and tried to get data to a csv file.我已经尝试过 beautifulSoup 4 并尝试将数据获取到 csv 文件。
import requests
from bs4 import BeautifulSoup
import pandas
# Replace search_url with a valid one byb visiting and searching booking.com
search_url = 'https://www.booking.com/searchresults.....'
page = requests.get(search_url)
soup = BeautifulSoup(page.content, 'html.parser')
week = soup.find(id = 'search_results_table' )
#print(week)
items = week.find_all(class_='sr-hotel__name')
print(items[0])
print(items[0].find(class_ = 'sr-hotel__name').get_text())
print(items[0].find(class_ = 'short-desc').get_text())
Here is a sample URL that can be used in place of search_url
. 这是一个示例 URL 可以用来代替search_url
。
This is the error msg...这是错误消息...
<span class="sr-hotel__name " data-et-click="
">
The Fort Printers
</span>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-77b38c8546bb> in <module>
11 items = week.find_all(class_='sr-hotel__name')
12 print(items[0])
---> 13 print(items[0].find(class_ = 'sr-hotel__name').get_text())
14 print(items[0].find(class_ = 'short-desc').get_text())
15
AttributeError: 'NoneType' object has no attribute 'get_text'
first of all, buddy, using requests might be really hard since you have to completely imitate the request your browser will send.首先,伙计,使用请求可能真的很难,因为您必须完全模仿浏览器将发送的请求。 You'll have to use some sniffing tool (burp, fiddler, wireshark) or in some cases look at the network in the developer mode on your browser which is relatively hard...您必须使用一些嗅探工具(burp、fiddler、wireshark),或者在某些情况下,在浏览器上以开发人员模式查看网络,这相对困难......
I'd suggest you to use "selenium" which is a web driver that makes your life easy when trying to scrape sites... read more about it here- https://medium.com/the-andela-way/introduction-to-web-scraping-using-selenium-7ec377a8cf72我建议你使用“selenium”,它是一个 web 驱动程序,可以让你在尝试抓取网站时变得轻松......在这里阅读更多相关信息 - https://medium.com/the-andela-way/introduction- to-web-scraping-using-selenium-7ec377a8cf72
And as for your error, I think you should use only.text instead of.get_text()至于你的错误,我认为你应该使用 only.text 而不是 .get_text()
Instead of using find()
method multiple times, if you consider using getText() method directly it can help.如果您考虑直接使用 getText() 方法,而不是多次使用find()
方法,它会有所帮助。
import requests
from bs4 import BeautifulSoup
import pandas
# Replace search_url with a valid one byb visiting and searching booking.com
search_url = 'https://www.booking.com/searchresults.....'
page = requests.get(search_url)
soup = BeautifulSoup(page.content, 'html.parser')
week = soup.find(id = 'search_results_table' )
#print(week)
items = week.find_all(class_='sr-hotel__name')
# print the whole thing
print(items[0])
hotel_name = items[0].getText()
# print hotel name
print(hotel_name)
# print without newlines
print(hotel_name[1:-1])
Hope this helps.希望这可以帮助。 I would suggest reading more of BeautifulSoup documentation.我建议阅读更多 BeautifulSoup 文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.