[英]How can the data of this script element be accessed?
I'm trying to use requests
and BeautifulSoup
to access some metadata on a page. 我正在尝试使用
requests
和BeautifulSoup
访问页面上的某些元数据。
Some script
elements can be accessed, but seemingly not one in particular. 可以访问某些
script
元素,但似乎不能访问其中的一个。
For example: 例如:
response = BeautifulSoup(requests.get("https://www.booking.com/hotel/br/olympia-residence.en-gb.html", verify=False).content, "html.parser")
scriptData = response.select('script[type="text/javascript"]')
In the HTML, there is a script element with a window.utag_data
variable, but scriptData
only contains data from another script
element. 在HTML中,有一个带有
window.utag_data
变量的脚本元素,但是scriptData
仅包含来自另一个script
元素的数据。
I thought that the particular script
element's absence may be due to it being loaded dynamically on the page, but if that's the case, I couldn't narrow down what response was delivering that data. 我以为缺少特定
script
元素可能是因为它在页面上动态加载了,但是如果是这种情况,我无法缩小传递该数据的响应范围。
Is it possible to get the window.utag_data
with requests
and BeautifulSoup
? 是否可以通过
requests
和BeautifulSoup
获取window.utag_data
?
It seems the website is sending a different HTML depending on how you access the request. 似乎网站根据您访问请求的方式发送了不同的HTML。
I can see window.utag_data
if I access that page from the browser, but not if I fetch it with curl: 如果我从浏览器访问该页面,则可以看到
window.utag_data
,但是如果我使用curl取回它,则看window.utag_data
:
$ curl -s https://www.booking.com/hotel/br/olympia-residence.en-gb.html | grep utag_data
$
It also doesn't show in the response downloaded with the code you provided: 它也不会显示在使用您提供的代码下载的响应中:
>>> 'window.utag_data' in str(response)
False
You can try to replay the request as if it were done by a browser (eg using a browser user agent). 您可以尝试重播请求,就好像它是由浏览器完成的(例如,使用浏览器用户代理)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.