简体   繁体   中英

How to download files from Google Drive using For Loop through API

When I retrieve csv files on Google Drive via api, I get files with no contents.
The code below consists of 3 parts (1: authenticate 2: search for files, 3: download files).
I suspect there is something wrong in step3: download files specifically around while done is False because I have no problem accessing Google Drive and download files. It's just that they are all empty files.

It would be great if someone can show me how I can fix it. Codes below are mostly borrowed from Google website. Thank you for your time in advance!

Step 1: Authentication

from apiclient import discovery
from httplib2 import Http
import oauth2client
from oauth2client import file, client, tools
obj = lambda: None # this code allows for an empty class
auth = {"auth_host_name":'localhost', 'noauth_local_webserver':'store_true', 'auth_host_port':[8080, 8090], 'logging_level':'ERROR'}
for k, v in auth.items():
    setattr(obj, k, v)

scopes = 'https://www.googleapis.com/auth/drive'
store = file.Storage('token_google_drive2.json')
creds = store.get()
# The following will takes a user to authentication link if no token file is found.
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_id.json', scopes)
    creds = tools.run_flow(flow, store, obj)

Step 2: Search for files and create a dictionary of files to download

from googleapiclient.discovery import build

page_token = None
drive_service = build('drive', 'v3', credentials=creds)
while True:
    name_list = []
    id_list = []
    response = drive_service.files().list(q="mimeType='text/csv' and name contains 'RR' and name contains '20191001'", spaces='drive',fields='nextPageToken, files(id, name)', pageToken=page_token).execute()
    for file in response.get('files', []):
        name = file.get('name')
        id_ = file.get('id')

        #name and id are strings, so create list first before creating a dictionary
        name_list.append(name)
        id_list.append(id_)


        #also you need to remove ":" in name_list or you cannot download files - nowhere to be found in the folder!
        name_list = [word.replace(':','') for word in name_list]
    page_token = response.get('nextPageToken', None)
    if page_token is None:
        break

#### Create dictionary using name_list and id_list
zipobj = zip(name_list, id_list)
temp_dic = dict(zipobj)

Step 3: Download Files (the troublesome part)

import io
from googleapiclient.http import MediaIoBaseDownload

for i in range(len(temp_dic.values())):
    file_id = list(temp_dic.values())[i]
    v = list(temp_dic.keys())[i]
    request = drive_service.files().get_media(fileId=file_id)
    fh = io.FileIO(v, mode='w')
    downloader = MediaIoBaseDownload(fh, request)
    done = False
while done is False:
    status, done = downloader.next_chunk()
    status_complete = int(status.progress()*100)
    print(f'Download of {len(temp_dic.values())} files, {int(status.progress()*100)}%')

Actually I figured out myself. Below is an edit. All I needed to do was delete done = False while done is False: and add fh.close() to close the downloader.

The complete revised part 3 is as follows:

from googleapiclient.http import MediaIoBaseDownload

for i in range(len(temp_dic.values())):

    file_id = list(temp_dic.values())[i]
    v = list(temp_dic.keys())[i]
    request = drive_service.files().get_media(fileId=file_id)

    # replace the filename and extension in the first field below
    fh = io.FileIO(v, mode='wb') #only in Windows, writing for binary is specified with wb
    downloader = MediaIoBaseDownload(fh, request)

    status, done = downloader.next_chunk()
    status_complete = int(status.progress()*100)
    print(f'{list(temp_dic.keys())[i]} is {int(status.progress()*100)}% downloaded')

fh.close()
print(f'{len(list(temp_dic.keys()))} files')

I would make a couple of small changes to your code

fh = io.FileIO(v, mode='wb')
downloader = MediaIoBaseDownload(fh, request, chunksize=1024*1024)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM