简体   繁体   中英

How to access google sheet's data using Python requests module

I want to access the contents that is in Google Docs or Spreadsheets. I'm using the link that is generated when I click 'Get Shareable link' in Google docs.

I'm only able to scrap login page's data when I use :

import requests 
r = requests.get("https://docs.google.com/spreadsheets/e/abcdef12345_sample/edit?usp=sharing", auth=('user', 'pass'));
print(r.content)

But I want to scrap the contents that are inside the spreadsheet/document. Note : MFA is enabled for my account.

How can I achieve that? Should I use any other kind of authentication other than basic auth?

Assuming you have already obtained the access token by following the OAuth 2 authentication process, you can use the following function I've written to pull the data from your google sheet into a pandas dataframe.

This method leverages the python requests module and avoids the recommended packages from Google.

import pandas as pd
import numpy as np
import requests

def get_google_sheet_df(headers: dict, google_sheet_id: str, sheet_name: str, _range: str):
    """_range is in A1 notation (i.e. A:I gives all rows for columns A to I)"""

    url = f'https://sheets.googleapis.com/v4/spreadsheets/{google_sheet_id}/values/{sheet_name}!{_range}'
    r = requests.get(url, headers=headers)
    values = r.json()['values']
    df = pd.DataFrame(values[1:])
    df.columns = values[0]
    df = df.apply(lambda x: x.str.strip()).replace('', np.nan)
    return df

headers = {'authorization': f'Bearer {access_token}',
           'Content-Type': 'application/vnd.api+json'}

google_sheet_id = '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms'
sheet_name = 'Class Data'
sample_range = 'A:F'

df = get_google_sheet_df(headers, google_sheet_id, sheet_name, sample_range)

You can test it on the google_sheet_id provided in this example, all you need is your access token.

谷歌表拉示例

There's a python library called gspread , install using pip install gspread You also need to get OAuth2 credentials from Google using Google's developer console . All the information you need is in the gspread docs.

You can use Google Sheets API.

The steps are here .

  1. Turn on Google Sheets API
  2. Install the Google Client Library
  3. Create a file called quickstart.py . Make sure you change the SPREADSHEET_ID. To find your spreadsheet id, check its URL. It is after /d/.

    https://docs.google.com/spreadsheets/d/spreadsheetId/edit#gid=sheetId

     from __future__ import print_function from googleapiclient.discovery import build from httplib2 import Http from oauth2client import file, client, tools # If modifying these scopes, delete the file token.json. SCOPES = 'https://www.googleapis.com/auth/spreadsheets.readonly' # The ID and range of a sample spreadsheet. SPREADSHEET_ID = 'spreadsheetId' RANGE_NAME = 'Class Data!A2:E' def main(): """Show basic usage of Sheets API. Print items in sheets. """ store = file.Storage('token.json') creds = store.get() if not creds or creds.invalid: flow = client.flow_from_clientsecrets('credentials.json', SCOPES) creds = tools.run_flow(flow, store) service = build('sheets', 'v4', http=creds.authorize(Http())) # Call the Sheets API SPREADSHEET_ID = '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms' RANGE_NAME = 'Class Data!A2:E' result = service.spreadsheets().values().get( spreadsheetId=SPREADSHEET_ID, range=RANGE_NAME).execute() values = result.get('values', []) if not values: print('No data found.') else: print('Name, Major:') for row in values: # Print columns A and E, which correspond to indices 0 and 4. print('%s, %s' % (row[0], row[4])) if __name__ == '__main__': main()
  4. Run quickstart.py

  5. Enjoy the API!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM