简体   繁体   English

我尝试了很多次从预订中获取数据。com。但我不能

[英]I tried lot of times to grab the data from booking.com.But i couldn't

I want to scrape the data from the booking.com but got some errors and couldn't find any similar codes.我想从 booking.com 中抓取数据,但出现了一些错误,找不到任何类似的代码。 I want to scrape the name of the hotel,price and etc.我想酒店的名称,价格等。

i have tried beautifulSoup 4 and tried to get data to a csv file.我已经尝试过 beautifulSoup 4 并尝试将数据获取到 csv 文件。

import requests
from bs4 import BeautifulSoup
import pandas

# Replace search_url with a valid one byb visiting and searching booking.com
search_url = 'https://www.booking.com/searchresults.....'
page = requests.get(search_url)
soup = BeautifulSoup(page.content, 'html.parser')

week = soup.find(id = 'search_results_table'  )
#print(week)

items = week.find_all(class_='sr-hotel__name')
print(items[0])
print(items[0].find(class_ = 'sr-hotel__name').get_text())
print(items[0].find(class_ = 'short-desc').get_text())

Here is a sample URL that can be used in place of search_url . 是一个示例 URL 可以用来代替search_url

This is the error msg...这是错误消息...

<span class="sr-hotel__name " data-et-click="
">
The Fort Printers
</span>
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-44-77b38c8546bb> in <module>
     11 items = week.find_all(class_='sr-hotel__name')
     12 print(items[0])
---> 13 print(items[0].find(class_ = 'sr-hotel__name').get_text())
     14 print(items[0].find(class_ = 'short-desc').get_text())
     15 

AttributeError: 'NoneType' object has no attribute 'get_text'

first of all, buddy, using requests might be really hard since you have to completely imitate the request your browser will send.首先,伙计,使用请求可能真的很难,因为您必须完全模仿浏览器将发送的请求。 You'll have to use some sniffing tool (burp, fiddler, wireshark) or in some cases look at the network in the developer mode on your browser which is relatively hard...您必须使用一些嗅探工具(burp、fiddler、wireshark),或者在某些情况下,在浏览器上以开发人员模式查看网络,这相对困难......

I'd suggest you to use "selenium" which is a web driver that makes your life easy when trying to scrape sites... read more about it here- https://medium.com/the-andela-way/introduction-to-web-scraping-using-selenium-7ec377a8cf72我建议你使用“selenium”,它是一个 web 驱动程序,可以让你在尝试抓取网站时变得轻松......在这里阅读更多相关信息 - https://medium.com/the-andela-way/introduction- to-web-scraping-using-selenium-7ec377a8cf72

And as for your error, I think you should use only.text instead of.get_text()至于你的错误,我认为你应该使用 only.text 而不是 .get_text()

Instead of using find() method multiple times, if you consider using getText() method directly it can help.如果您考虑直接使用 getText() 方法,而不是多次使用find()方法,它会有所帮助。

import requests
from bs4 import BeautifulSoup
import pandas

# Replace search_url with a valid one byb visiting and searching booking.com
search_url = 'https://www.booking.com/searchresults.....'
page = requests.get(search_url)
soup = BeautifulSoup(page.content, 'html.parser')

week = soup.find(id = 'search_results_table'  )
#print(week)

items = week.find_all(class_='sr-hotel__name')
# print the whole thing
print(items[0])
hotel_name = items[0].getText()

# print hotel name
print(hotel_name)

# print without newlines
print(hotel_name[1:-1])

Hope this helps.希望这可以帮助。 I would suggest reading more of BeautifulSoup documentation.我建议阅读更多 BeautifulSoup 文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我尝试了一切但无法导入 SpaCy - I tried everything but couldn't import SpaCy 我试图做一个密码系统。 但我无法从用户那里获得条目,谁能解释我为什么? - I tried to do a password system. but I couldn' t get entry from user, can someone explain me why? 从 Booking.com 抓取可用性数据 - Scraping Availability data from Booking.com 大家好,我如何在屏幕管理器中播放 kivy 中的视频我试过但不能 - Hello everyone, how do i play video in kivy at screen manager i tried but couldn't 如何在 booking.com 中找到 hotel_id? - How can I find hotel_id in booking.com? 无法从Booking * com使用BeautifulSoup find()获得所需的标签 - Can't get the desired tag with BeautifulSoup find() from Booking*com 如何从列表中删除重复值但使用循环让其中一个列表存在..我已尽力弄清楚但无法 - how to remove repeating value from a list but let reside one of them list using a loop..i have tried my best to figure it out but couldn't 如何使用 python 从 NSE 图表中获取数据? - How can i grab data from a NSE chart with python? 如何从xml或tcx文件中获取数据系列 - How can I grab data series from xml or tcx file 如何从充满这些字符串的数据框中获取日期? - How would I grab the date from a data frame full of these strings?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM