简体   繁体   中英

How to export all sheets in a Spreadsheet as CSV files using the Drive API with a Service Account in Python?

I've built a successful service connection to the Drive API already, and I'm creating export URLs to download each sheet in a Spreadsheet as a CSV file by sending requests with Google's AuthorizedSession class. For some reason, only a portion of the CSV files come back correct, with the others containing broken HTML. When I send a single request, the sheet always comes back correct, but when I loop through the sheets and start sending requests things start to break. I've identified there's a problem with how I'm passing the credentials this way, but I'm not sure if I'm using AuthorizedSession correctly. Can anyone help me figure this one out?

from googleapiclient.discovery import build
from google.oauth2 import service_account
from google.auth.transport.requests import AuthorizedSession
import re
import shutil
import urllib.parse


CLIENT_SECRET_FILE = "client_secret.json"
API_NAME = "sheets"
API_VERSION = "v4"
SCOPES = ["https://www.googleapis.com/auth/drive.readonly"]
SPREADSHEET_ID = "Spreadsheet ID goes here"
print(CLIENT_SECRET_FILE, API_NAME, API_VERSION, SCOPES, sep="-")

cred = service_account.Credentials.from_service_account_file(
    CLIENT_SECRET_FILE, scopes=SCOPES
)

try:
    service = build(API_NAME, API_VERSION, credentials=cred)
    print(API_NAME, "service created successfully")
    result = service.spreadsheets().get(spreadsheetId=SPREADSHEET_ID).execute()
    export_url = re.sub("\/edit$", "/export", result["spreadsheetUrl"])
    authed_session = AuthorizedSession(cred)

    for sheet in result["sheets"]:
        sheet_name = sheet["properties"]["title"]
        params = {"format": "csv", "gid": sheet["properties"]["sheetId"]}
        query_params = urllib.parse.urlencode(params)
        url = export_url + "?" + query_params
        response = authed_session.get(url)

        file_path = "./Downloads/" + sheet_name + ".csv"
        with open(file_path, "wb") as csv_file:
            csv_file.write(response.content)
            print("Downloaded sheet: " + sheet_name)
    print("Downloads complete")
except Exception as e:
    print("Unable to connect")
    print(e)

This code should get you a sheetsservice

"""Hello sheets."""

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials


SCOPES = ['"https://www.googleapis.com/auth/drive.readonly']
KEY_FILE_LOCATION = '<REPLACE_WITH_JSON_FILE>'
VIEW_ID = '<REPLACE_WITH_VIEW_ID>'


def initialize_sheet():
  """Initializes an sheetservice object.

  Returns:
    An authorized sheetservice object.
  """
  credentials = ServiceAccountCredentials.from_json_keyfile_name(
      KEY_FILE_LOCATION, SCOPES)

  # Build the service object.
  sheet= build('sheet', 'v4', credentials=credentials)

  return sheet

If you use the same sheet service built by this method then you souldnt have any issues looping

I think that your script of authed_session = AuthorizedSession(cred) and response = authed_session.get(url) are correct. I thought that in your situation, the number of requests might be large in the short time, and this might be due to the reason of your issue. So as a simple modification, how about the following modification?

From:

for sheet in result["sheets"]:
    sheet_name = sheet["properties"]["title"]
    params = {"format": "csv", "gid": sheet["properties"]["sheetId"]}
    query_params = urllib.parse.urlencode(params)
    url = export_url + "?" + query_params
    response = authed_session.get(url)

    file_path = "./Downloads/" + sheet_name + ".csv"
    with open(file_path, "wb") as csv_file:
        csv_file.write(response.content)
        print("Downloaded sheet: " + sheet_name)

To:

for sheet in result["sheets"]:
    sheet_name = sheet["properties"]["title"]
    params = {"format": "csv", "gid": sheet["properties"]["sheetId"]}
    query_params = urllib.parse.urlencode(params)
    url = export_url + "?" + query_params
    response = authed_session.get(url)

    file_path = "./Downloads/" + sheet_name + ".csv"
    with open(file_path, "wb") as csv_file:
        csv_file.write(response.content)
        print("Downloaded sheet: " + sheet_name)

    time.sleep(3)  # <--- Added. Please adjust the value of 3 for your actual situation.
  • In this case, please use import time .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM