简体   繁体   English

使用 Python 将大 zip 文件上传到网站

[英]Uploading large zip files to a website using Python

I have the following problem: I need to upload large.zip-files (usually >500MB with a maximum of ca 5GB) to a website which then processes these files.我遇到以下问题:我需要将大型 .zip 文件(通常 >500MB,最大 ca 5GB)上传到一个网站,然后由该网站处理这些文件。 I do this in Python 2.7.16 on Windows 32-Bit.我在Python 2.7.16在 Windows 32 位上执行此操作。 Sadly I cannot change my setup (from 32-Bit to 64-Bit) nor can I install additional Python plugins (I have requests, urllib and urllib2 and several othersinstalled) due to company restrictions.遗憾的是,由于公司限制,我无法更改我的设置(从 32 位到 64 位),也无法安装额外的 Python 插件(我安装了请求、urllib 和 urllib2 以及其他几个插件)。 My code looks like this now:我的代码现在看起来像这样:

 import requests

 FileList=["C:\File01.zip", "C:\FileA02.zip", "C:\UserFile993.zip"]
 UploadURL = "https://mywebsite.com/submitFile"
 for FilePath in FileList:
    print("Upload file: "+str(FilePath))
    session = requests.Session()
        with open(FilePath, "rb") as file:
        session.post(UploadURL,data={'file':'Send file'},files={'FileToBeUploaded':FilePath})
    print("Upload done: "+str(FilePath))
    session.close()

Since my FileList is quite long (>100 entries), I just pasted here an excerpt of it.因为我的FileList很长(>100 个条目),所以我只是在此处粘贴了它的摘录。 The code above works well if there are file below 600MB.如果文件小于 600MB,上面的代码运行良好。 Any file above that will throw me this error:上面的任何文件都会抛出这个错误:

  File "<stdin>", line 1, in <module>
  File "C:\Users\AAA253\Desktop\DingDong.py", line 39, in <module>
    session.post(UploadURL,data={'file':'Send file'},files={'FileToBeUploaded':FilePath})
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 522, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 461, in request
    prep = self.prepare_request(req)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 394, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Python27\lib\site-packages\requests\models.py", line 297, in prepare
    self.prepare_body(data, files, json)
  File "C:\Python27\lib\site-packages\requests\models.py", line 455, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "C:\Python27\lib\site-packages\requests\models.py", line 158, in _encode_files
    body, content_type = encode_multipart_formdata(new_fields)
  File "C:\Python27\lib\site-packages\requests\packages\urllib3\filepost.py", line 86, in encode_multipart_formdata
    body.write(data)
MemoryError

I checked already the forum here to find some solutions, but sadly I could not find any suitable solution.我已经检查了这里的论坛以找到一些解决方案,但遗憾的是我找不到任何合适的解决方案。 Anybody has an idea on how to get this done?有人知道如何完成这项工作吗? Could it be made by loading the file in chunks?可以通过分块加载文件来实现吗? If so, how to upload the file in chunks, so that the server does not "cancel" the operation?如果是这样,如何分块上传文件,使服务器不“取消”操作?

Edit: using the answer from @AKX I use this code:编辑:使用@AKX 的答案我使用这段代码:

import requests
from requests_toolbelt.multipart import encoder

FileList=["C:\File01.zip", "C:\FileA02.zip", "C:\UserFile993.zip"]
UploadURL = "https://mywebsite.com/submitFile"
for FilePath in FileList:
    session = requests.Session()
    with open(FilePath, 'rb') as f:
        form = encoder.MultipartEncoder({"documents": (FilePath, f, "application/octet-stream"),"composite": "NONE",})
        headers = {"Prefer": "respond-async", "Content-Type": form.content_type}
        resp = session.post(UploadURL,data={'file':'Send file'},files={'FileToBeUploaded':form})
    session.close()

Nevertheless I get nearly the same errors:尽管如此,我还是得到了几乎相同的错误:

    File "<stdin>", line 1, in <module>
  File "C:\Users\AAA253\Desktop\DingDong.py", line 48, in <module>
    resp =  session.post(UploadURL,data={'file':'Send file'},files={'FileToBeUploaded':form})
  File "C:\Python27\lib\site-packages\requests-2.24.0-py2.7.egg\requests\sessions.py", line 578, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\Python27\lib\site-packages\requests-2.24.0-py2.7.egg\requests\sessions.py", line 516, in request
    prep = self.prepare_request(req)
  File "C:\Python27\lib\site-packages\requests-2.24.0-py2.7.egg\requests\sessions.py", line 459, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Python27\lib\site-packages\requests-2.24.0-py2.7.egg\requests\models.py", line 317, in prepare
    self.prepare_body(data, files, json)
  File "C:\Python27\lib\site-packages\requests-2.24.0-py2.7.egg\requests\models.py", line 505, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "C:\Python27\lib\site-packages\requests-2.24.0-py2.7.egg\requests\models.py", line 159, in _encode_files
    fdata = fp.read()
  File "build\bdist.win32\egg\requests_toolbelt\multipart\encoder.py", line 314, in read
  File "build\bdist.win32\egg\requests_toolbelt\multipart\encoder.py", line 194, in _load
  File "build\bdist.win32\egg\requests_toolbelt\multipart\encoder.py", line 256, in _write
  File "build\bdist.win32\egg\requests_toolbelt\multipart\encoder.py", line 552, in append
MemoryError

You will more likely than not need the requests-toolbelt streaming MultipartEncoder.您很可能需要requests-toolbelt streaming MultipartEncoder。

Even if your company restrictions forbid installing new packages, you can likely vendor in the parts of requests_toolbelt you need (maybe the whole package) into your project's directory.即使您的公司限制禁止安装新包,您也可以将您需要的requests_toolbelt部分(可能是整个包)供应到您的项目目录中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM