如何访问此脚本元素的数据？

Question

I'm trying to use requests and BeautifulSoup to access some metadata on a page. 我正在尝试使用requests和BeautifulSoup访问页面上的某些元数据。

Some script elements can be accessed, but seemingly not one in particular. 可以访问某些script元素，但似乎不能访问其中的一个。

For example: 例如：

response = BeautifulSoup(requests.get("https://www.booking.com/hotel/br/olympia-residence.en-gb.html", verify=False).content, "html.parser")

scriptData = response.select('script[type="text/javascript"]')

In the HTML, there is a script element with a window.utag_data variable, but scriptData only contains data from another script element. 在HTML中，有一个带有window.utag_data变量的脚本元素，但是scriptData仅包含来自另一个script元素的数据。

I thought that the particular script element's absence may be due to it being loaded dynamically on the page, but if that's the case, I couldn't narrow down what response was delivering that data. 我以为缺少特定script元素可能是因为它在页面上动态加载了，但是如果是这种情况，我无法缩小传递该数据的响应范围。

Is it possible to get the window.utag_data with requests and BeautifulSoup ? 是否可以通过requests和BeautifulSoup获取window.utag_data ？

Answer 1

It seems the website is sending a different HTML depending on how you access the request. 似乎网站根据您访问请求的方式发送了不同的HTML。

I can see window.utag_data if I access that page from the browser, but not if I fetch it with curl: 如果我从浏览器访问该页面，则可以看到window.utag_data ，但是如果我使用curl取回它，则看window.utag_data ：

$ curl -s https://www.booking.com/hotel/br/olympia-residence.en-gb.html | grep utag_data
$

It also doesn't show in the response downloaded with the code you provided: 它也不会显示在使用您提供的代码下载的响应中：

>>> 'window.utag_data' in str(response)
False

You can try to replay the request as if it were done by a browser (eg using a browser user agent). 您可以尝试重播请求，就好像它是由浏览器完成的（例如，使用浏览器用户代理）。

如何访问此脚本元素的数据？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-04-08 03:42:07

如何访问此脚本元素的数据？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-04-08 03:42:07

解决方案1
1 已采纳 2017-04-08 03:42:07