Python如何获取（解码）html源代码

Question

I am trying in python (2.7.13) to get the source code of a webpage (having the current foreign exchange rates).我正在尝试在 python （2.7.13）中获取网页的源代码（具有当前的外汇汇率）。 Normally that is no problem with requests.get(url, headers) etc. In this case I can download/get the webpage, but some parts seems to be (base64?) encoded.通常这对 requests.get(url, headers) 等没有问题。在这种情况下，我可以下载/获取网页，但某些部分似乎是（base64？）编码的。

However when I visit the page in a browser and I view the source code: the right (decoded) code will be shown in the browser.但是，当我在浏览器中访问该页面并查看源代码时：正确的（解码的）代码将显示在浏览器中。 Question is: how can I get the (decoded) web page source.问题是：如何获得（解码的）web 页面源。 The url is: https://www.isbank.com.tr/en/foreign-exchange-rates url 是： https://www.isbank.com.tr/en/foreign-exchange-rates

Part of the code I use is:我使用的部分代码是：

url = "https://www.isbank.com.tr/en/foreign-exchange-rates"
resp = requests.get(url)
out = resp.text

Answer 1

The response contains the text in Turkish, saying that the request is rejected due to the "unusual traffic detected from your device".响应包含土耳其语文本，表示由于“从您的设备检测到异常流量”，请求被拒绝。 It seems that the site checks the User-Agent header to prevent simple scripts from crawling it.该站点似乎检查了User-Agent header 以防止简单的脚本对其进行爬网。 You can bypass it by adding some plausible header:您可以通过添加一些似是而非的 header 来绕过它：

url = 'https://www.isbank.com.tr/en/foreign-exchange-rates'
ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
resp = requests.get(url, headers={'User-Agent': ua})
out = resp.text

Python如何获取（解码）html源代码

问题描述

1 个解决方案

解决方案1
0 2022-09-17 09:34:10

Python如何获取（解码）html源代码

问题描述

1 个解决方案

解决方案1 0 2022-09-17 09:34:10

解决方案1
0 2022-09-17 09:34:10