简体   繁体   English

如何向下载的 html 文件发送请求

[英]How to send requests to a downloaded html file

I have a.html file downloaded and want to send a request to this file to grab it's content.我下载了一个 .html 文件,想向该文件发送请求以获取其内容。

However, if I do the following:但是,如果我执行以下操作:

import requests
html_file  = "/user/some_html.html"
r = requests.get(html_file)

Gives the following error:给出以下错误:

Invalid URL 'some_html.html': No schema supplied.无效 URL 'some_html.html':未提供架构。

If I add a schema I get the following error:如果我添加模式,我会收到以下错误:

HTTPConnectionPool(host='some_html.html', port=80): Max retries exceeded with url: HTTPConnectionPool(host='some_html.html', port=80):最大重试次数超过 url:

I want to know how to specifically send a request to a html file when it's downloaded.我想知道如何在下载文件时专门向 html 文件发送请求。

You are accessing html file from local directory.您正在从本地目录访问 html 文件。 get() method uses HTTPConnection and port 80 to access data from website not a local directory. get()方法使用HTTPConnection和端口 80 从网站而非本地目录访问数据。 To access file from local directory using get() method use Xampp or Wampp.要使用get()方法从本地目录访问文件,请使用 Xampp 或 Wampp。 for accessing file from local directory you can use open() while requests.get() is for accessing file from Port 80 using http Connection in simple word from inte.net not local directory要从本地目录访问文件,您可以使用open()requests.get()用于使用 http 从Port 80访问文件 从 inte.net 不是本地目录的简单单词连接

import requests
html_file  = "/user/some_html.html"
t=open(html_file, "r")
for v in t.readlines():
  print(v)

Output: Output: 在此处输入图像描述

You don't "send a request to a html file".您不会“向 html 文件发送请求”。 Instead, you can send a request to a HTTP server on the inte.net which will return a response with the contents of a html file.相反,您可以向 inte.net 上的 HTTP 服务器发送请求,该服务器将返回包含 html 文件内容的响应。

The file itself knows nothing about "requests".该文件本身对“请求”一无所知。 If you have the file stored locally and want to do something with it, then you can open it just like any other file.如果您将文件存储在本地并想对其执行某些操作,则可以像打开任何其他文件一样打开它。

If you are interested in learning more about the request and response model, I suggest you try a something like如果您有兴趣了解有关请求和响应 model 的更多信息,我建议您尝试类似

response = requests.get("http://stackoverflow.com")

You should also read about HTTP and requests and responses to better understand how this works.您还应该阅读有关 HTTP 以及请求和响应的信息,以更好地了解其工作原理。

You can do it by setting up a local server to your html file.您可以通过为您的 html 文件设置本地服务器来实现。 If you use Visual Studio Code , you can install Live Server by Ritwick Dey.如果你使用Visual Studio Code ,你可以安装 Ritwick Dey 的Live Server

Then you do as follows:然后你做如下:

1 - Make the first request and save the html content into a.html file: 1 - 发出第一个请求并将 html 内容保存到 a.html 文件中:

my_req.py我的请求.py

import requests

file_path = './'
file_name = 'my_file'

url = "https://www.qwant.com/"

response = requests.request("GET", url)

w = open(file_path + file_name + '.html', 'w')
w.write(response.text)

2 - With Live Server installed on Visual Studio Code , click on my_file.html and then click on Go Live . 2 - 在Visual Studio Code上安装 Live Server 后,单击my_file.html ,然后单击Go Live

上线

and

3 - Now you can make a request to your local http schema: 3 - 现在您可以向本地 http架构发出请求

second request第二个请求

import requests

url = "http://127.0.0.1:5500/my_file.html"

response = requests.request("GET", url)

print(response.text)

And, tcharan.!而且,tcharan.! do what you need to do.做你需要做的。

On a crawler work, I had one situation where there was a difference between the content displayed on the website and the content retrieved with the response.text so the xpaths did not were the same as on the website, so I needed to download the content, making a local html file, and get the new ones xpaths to get the info that I needed.在爬虫工作中,我遇到过一种情况,网站上显示的内容与通过response.text检索到的内容存在差异,因此 xpaths 与网站上的不同,因此我需要下载内容,制作一个本地 html 文件,并获取新的 xpaths 以获取我需要的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM