简体繁体 English

将本地文件与HTTP服务器位置同步（在Python中）

[英]Sync local file with HTTP server location (in Python)

原文 2011-10-01 23:25:55 1 3 python/ http/ httpclient/ urllib2/ if-modified-since

I have an HTTP server which host some large file and have python clients (GUI apps) which download it. 我有一个HTTP服务器，它托管一些大文件，并有python客户端（GUI应用程序）下载它。
I want the clients to download the file only when needed, but have an up-to-date file on each run. 我希望客户端仅在需要时下载文件，但每次运行时都有一个最新文件。

I thought each client will download the file on each run using the If-Modified-Since HTTP header with the file time of the existing file, if any. 我认为每个客户端将使用If-Modified-Since HTTP标头在每次运行时下载文件，文件时间为现有文件（如果有）。 Can someone suggest how to do it in python? 有人可以建议如何在python中做到这一点？

Can someone suggest an alternative, easy, way to achieve my goal? 有人可以建议一种替代的，简单的方法来实现我的目标吗？

3 个解决方案

你可以添加一个名为ETag的头文件（你的文件的哈希值，md5sum或sha256等）来比较两个文件是否不同而不是最后修改日期

I'm assuming some things right now, BUT.. One solution would be to have a separate HTTP file on the server (check.php) which creates a hash/checksum of each files you're hosting. 我现在正在做一些事情，但是..一个解决方案是在服务器上有一个单独的HTTP文件（check.php），它会为你托管的每个文件创建一个哈希/校验和。 If the files differ from the local files, then the client will download the file. 如果文件与本地文件不同，则客户端将下载该文件。 This means that if the content of the file on the server changes, the client will notice the change since the checksum will differ. 这意味着如果服务器上文件的内容发生变化，客户端会注意到更改，因为校验和会有所不同。

do a MD5 hash of the file contents, put it in a database or something and check against it before downloading anything. 执行文件内容的MD5哈希，将其放入数据库或其他内容并在下载任何内容之前检查它。

Your solution would work to, but it requires the server to actually include the "modified" date in the Header for the GET request (some server softwares does not do this). 您的解决方案可以解决，但它要求服务器在GET请求的Header中实际包含“已修改”的日期（某些服务器软件不会这样做）。

I'd say putting up a database that looks something like: 我会说建立一个类似于的数据库：

[ID] [File_name] [File_hash] [ID] [File_name] [File_hash]

0001 moo.txt asd124kJKJhj124kjh12j 0001 moo.txt asd124kJKJhj124kjh12j

It seems to me the easiest solution is hosting the file in mercurial and using mercurial api to find the file's hash, downloading the file if the hash has changed. 在我看来，最简单的解决方案是在mercurial中托管文件，并使用mercurial api查找文件的哈希值，如果哈希值已更改，则下载文件。 Calculating the hash can be done as the answer to this question ; 计算哈希值可以作为这个问题的答案来完成; for downloading the file urllib will be enough. 下载文件urllib就足够了。