简体   繁体   English

urllib2什么时候实际从网址下载文件?

[英]When does urllib2 actually download a file from a url?

url = "http://example.com/file.xml"
data = urllib2.urlopen(url)
data.read()

The question is, when exactly will the file be downloaded from the internet? 问题是,何时从互联网上下载文件? When i do urlopen or .read()? 当我做urlopen或.read()? On my network interface I see high traffic both times. 在我的网络界面上,我看到两次都有高流量。

Witout looking at the code, I'd expect that the following happens: 看到代码,我希望发生以下情况:

  1. urlopen() opens the connection, and sends the query. urlopen()打开连接,并发送查询。 Then the server starts feeding the reply. 然后服务器开始提供回复。 At this point, the data accumulates in buffers until they are full and the operating system tells the server to hold on for a while. 此时,数据在缓冲区中累积,直到它们已满并且操作系统告诉服务器保持一段时间。
  2. Then data.read() empties the buffer, so the operating system tells the server to go on, and the rest of the reply gets downloaded. 然后data.read()清空缓冲区,因此操作系统会告诉服务器继续运行,然后下载其余的回复。

Naturally, if the reply is short enough, or if the .read() happens quickly enough, then the buffers do not have time to fill up and the download happens in one go. 当然,如果回复足够短,或者.read()发生得足够快,那么缓冲区没有时间填满,下载一次完成。

I agree with ddaa. 我同意ddaa。 However, if you want to understand this sort of thing, you can set up a dummy server using something like nc (in *nix) and then open the URL in the interactive Python interpreter. 但是,如果您想了解这类事情,可以使用类似nc (in * nix)之类的东西设置虚拟服务器,然后在交互式Python解释器中打开URL。

In one terminal, run nc -l 1234 which will open a socket and listen for connections on port 1234 of the local machine. 在一个终端中,运行nc -l 1234 ,它将打开一个套接字并侦听本地机器的端口1234上的连接。 nc will accept an incoming connection and display whatever it reads from the socket. nc将接受传入连接并显示它从套接字读取的内容。 Anything you type into nc will be sent over the socket to the remote connection, in this case Python's urlopen() . 你在nc输入的任何内容都将通过套接字发送到远程连接,在本例中是Python的urlopen()

Run Python in another terminal and enter your code, ie 在另一个终端运行Python并输入您的代码,即

data = urllib2.urlopen('http://127.0.0.1:1234')
data.read()

The call to urlopen() will establish the connection to the server, send the request and then block waiting for a response. urlopen()的调用将建立与服务器的连接,发送请求然后阻止等待响应。 You will see that nc prints the HTTP request into it's terminal. 您将看到nc将HTTP请求打印到它的终端。

Now type something into the terminal that is running nc . 现在在运行nc的终端中键入内容。 The call to urlopen() will still block until you press ENTER in nc , that is, until it receives a new line character. urlopen()的调用仍会阻塞,直到你在nc按ENTER键,直到它收到一个换行符。 So urlopen() will not return until it has read at least one new line character. 所以urlopen()在读取至少一个新行字符之前不会返回。 (For those concerned about possible buffering by nc , this is not an issue. urlopen() will block until it sees the first new line character.) (对于那些担心nc可能缓冲的人来说,这不是问题urlopen()会阻塞,直到它看到第一个换行符。)

So it should be noted that urlopen() will block until the first new line character is received, after which data can be read from the connection. 因此应该注意, urlopen()将阻塞,直到收到第一个新行字符,之后可以从连接中读取数据。 In practice, HTTP responses are short multiline responses, so urlopen() should return quite quickly. 在实践中,HTTP响应是短多线响应,因此urlopen()应该很快返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM