简体   繁体   中英

How to download file from website

This is my first question, so please be nice if I am, in any way, doing it wrong.

I am using the requests module in python 3.3 to automate file downloads from a few sites, but this one in particular is giving me trouble when I attempt to get the csv file. I have a workable level of competence in python but am unacquainted with html and javascript as far as website interaction is concerned.

Here is the relevant code.

import requests
import datetime

now = datetime.datetime.now().strftime("%Y%m%d")

folder = 'some path'

url = 'https://gats.pjm-eis.com/gats2/PublicReports/RenewableGeneratorsRegisteredInGATS/'#ExportTo'
payload = {'exportType' : 'CSV',
           'tabNumber' : ''}
doc = requests.post(url, data=payload, stream=True)

output = open(folder+now+'_GATSRegistered.csv','wb')
output.write(doc.content)
output.close()

I don't get any errors, but the document I am creating is based on an error page. I have successfully done this for a site where the url pointed directly to the file ( 'http://www.place.com/path/file.xlsx ), so I know what to do with the file once it's been retrieved. But that simply required a 'get' request.

So, my questions:

  • What is the correct request to post?
  • Is post even the right thing to do?
  • Is this a special case or something I should know how to address in general?
  • Anything else I should be doing differently?

I looked at the page in Chrome and opened the developers console with the network tab open. There you can see that clicking the "CSV" button sends a POST request with a lot of form data.

exportType:CSV
tabNumber:
CSV_CH:1
PRN_CH:0
GridView$DXFREditorcol0:
GridView$DXFREditorcol1:
GridView$DXFREditorcol2:
GridView$DXFREditorcol3:
GridView$DXFREditorcol4:
GridView$DXFREditorcol5:
GridView$DXFREditorcol6:
GridView$DXFREditorcol7:
GridView$DXFREditorcol8:
GridView$DXFREditorcol9:
GridView$DXFREditorcol10:
GridView$DXFREditorcol11:
GridView$DXFREditorcol12:
GridView$DXFREditorcol13:
GridView$DXFREditorcol14:
GridView$DXFREditorcol15:
GridView$DXFREditorcol16:
GridView$DXFREditorcol17:
GridView$DXFREditorcol18:
GridView$DXFREditorcol19:
GridView$DXFREditorcol20:
GridView$DXFREditorcol21:
GridView$DXFREditorcol22:
GridView$DXFREditorcol23:
GridView$DXFREditorcol24:
GridView$DXFREditorcol25:
GridView$DXFREditorcol26:
GridView_custwindowWS:0:0:-1:-10000:-10000:0:1px:-10000:1:0:0:0
GridView_DXHFPWS:0:0:-1:-10000:-10000:0:180px:100px:1:0:0:0
GridView_DXPagerBottom_PSPSI:2
GridView$DXSelInput:
GridView$DXKVInput:[]
GridView$CallbackState:BwMHAQIFU3RhdGUGEAEHGwcAAgEHAQIBBwICAQcDAgEHBAIBBwUCAQcGAgEHBwIBBwgCAQcJAgEHCgIBBwsCAQcMAgEHDQIBBw4CAQcPAgEHEAIBBxECAQcSAgEHEwIBBxQCAQcVAgEHFgIBBxcCAQcYAgEHGQIBBxoCAQcABxsHAAcABwEHAAcCBwAHAwcABwQHAAcFBwAHBgcABwcHAAcIBwAHCQcABwoHAAcLBwAHDAcABw0HAAcOBwAHDwcABxAHAAcRBwAHEgcABxMHAAcUBwAHFQcABxYHAAcXBwAHGAcABxkHAAcaBwAHAAcAAgAFAAAAgAkCCUVudGl0eUtleQkCAAIAAwcEAgAHAAIBBTaVAAAHAAIBBwAHAAIQRmlsdGVyRXhwcmVzc2lvbgcCAAIIUGFnZVNpemUDBzI=
GridView$DXSyncInput:
GridView_DXFilterRowMenuCI:
DXScript:1_142,1_80,1_135,1_91,14_0,1_90,1_113,14_23,14_10,1_98,1_105,1_77,1_128,1_126,1_124,1_133,1_119,1_127,1_104,1_101,1_84,1_109,1_92,14_1,1_94,1_97,1_95,1_96,1_106,14_4,1_100,1_117,1_103,14_12,14_13,1_102,1_129,1_107,1_137,1_114,14_16,10_2,10_1,10_3,10_4,14_3
DXMVCEditorsValues:{"GridView_DXFREditorcol0":null,"GridView_DXFREditorcol1":null,"GridView_DXFREditorcol2":null,"GridView_DXFREditorcol3":null,"GridView_DXFREditorcol4":null,"GridView_DXFREditorcol5":null,"GridView_DXFREditorcol6":null,"GridView_DXFREditorcol7":null,"GridView_DXFREditorcol8":null,"GridView_DXFREditorcol9":null,"GridView_DXFREditorcol10":null,"GridView_DXFREditorcol11":null,"GridView_DXFREditorcol12":null,"GridView_DXFREditorcol13":null,"GridView_DXFREditorcol14":null,"GridView_DXFREditorcol15":null,"GridView_DXFREditorcol16":null,"GridView_DXFREditorcol17":null,"GridView_DXFREditorcol18":null,"GridView_DXFREditorcol19":null,"GridView_DXFREditorcol20":null,"GridView_DXFREditorcol21":null,"GridView_DXFREditorcol22":null,"GridView_DXFREditorcol23":null,"GridView_DXFREditorcol24":null,"GridView_DXFREditorcol25":null,"GridView_DXFREditorcol26":null}

You can see which of the above is absolutely necessary for you to send to the server. I doubt all of them are required (but I've been wrong plenty :) ).

That said, when using stream=True , you should use iter_content . So your code would look like:

payload = {
# Form contents
}
r = requests.post(url, data=payload, stream=True)
with open(filename, 'wb') as output:
    for chunk in r.iter_content():
        output.write(chunk)

The for-loop ensures that as it becomes available it is written to your file. When it is stalled, you won't have to worry about it hanging on you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM