使用 python 请求保存整个 web 页面而不是基本的 html 请求

Question

So I want to use Beautiful Soup to scrape this page: https://www.nseindia.com/option-chain#optionchain_equity and I access it using requests module.所以我想用 Beautiful Soup 来抓取这个页面： https://www.nseindia.com/option-chain#optionchain_equity我使用请求模块访问它。 But I guess requests saves only the basic html not the main table in that page.但我猜 requests 只保存基本的 html 而不是该页面中的主表。 Using chrome to dowload "Webpage, Complete" works but how can I automate it in python?使用 chrome 下载“网页，完成”有效，但如何在 python 中自动化它？ Also without those headers, requests times out so it's necessary I guess.同样没有这些标头，请求会超时，所以我猜是有必要的。 Code:代码：

import requests

url = "https://www.nseindia.com/option-chain#optionchain_equity"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/80.0.3987.149 Safari/537.36',
           'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
response = requests.get(url, headers=headers, timeout=5)
file = open("nse.html", "w")
file.write(response.text)

Answer 1

If you are mainly looking for the table data, then that table data is loaded via ajax call.如果您主要是查找表数据，则该表数据通过 ajax 调用加载。

The following script mainly saves the data to a json file.以下脚本主要将数据保存到 json 文件中。

import requests, json

headers = {'user-agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}

res = requests.get("https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY", headers=headers)

with open("data.json", "w") as f:
     json.dump(res.json(), f)

Answer 2

if u want to save a whole web page, u may try to find something like a headless chrome API, something like that:如果您想保存整个 web 页面，您可以尝试找到类似无头镀铬 API 之类的东西：

Download file through Google Chrome in headless mode 在无头模式下通过 Google Chrome 下载文件

To interrupt a web page, using a simple python won't help, it just handle as a file reading stream, what you want is a file reading and the web browser behavior, a headless chrome API is the way to go.... To interrupt a web page, using a simple python won't help, it just handle as a file reading stream, what you want is a file reading and the web browser behavior, a headless chrome API is the way to go....

使用 python 请求保存整个 web 页面而不是基本的 html 请求

问题描述

2 个解决方案

解决方案1
1 2020-08-17 10:59:49

解决方案2
1 2020-08-17 11:04:31

使用 python 请求保存整个 web 页面而不是基本的 html 请求

问题描述

2 个解决方案

解决方案1 1 2020-08-17 10:59:49

解决方案2 1 2020-08-17 11:04:31

解决方案1
1 2020-08-17 10:59:49

解决方案2
1 2020-08-17 11:04:31