简体   繁体   English

将 pandas 数据框附加到 Google 电子表格

[英]Appending pandas Data Frame to Google spreadsheet

Case: My script returns a data frame that needs has to be appended to an existing google spreadsheet as new rows of data.As of now, I'm appending a data frame as multiple single rows through gspread.案例:我的脚本返回一个数据框,该数据框必须作为新数据行附加到现有的谷歌电子表格中。截至目前,我通过 gspread 将数据框附加为多个单行。

My Code:我的代码:

import gspread
import pandas as pd
df = pd.DataFrame()

# After some processing a non-empty data frame has been created.

output_conn = gc.open("SheetName").worksheet("xyz")

# Here 'SheetName' is google spreadsheet and 'xyz' is sheet in the workbook

for i, row in df.iterrows():
    output_conn.append_row(row)

Is there a way to append entire data-frame rather than multiple single rows?有没有办法 append 整个数据框而不是多个单行?

I can recommend gspread-dataframe :我可以推荐gspread-dataframe

import gspread_dataframe as gd

# Connecting with `gspread` here

ws = gc.open("SheetName").worksheet("xyz")
existing = gd.get_as_dataframe(ws)
updated = existing.append(your_new_data)
gd.set_with_dataframe(ws, updated)

Here is the code to write, append(without loading the existing sheet into memory), and read to google sheets.这是编写、附加(不将现有工作表加载到内存中)和读取到谷歌工作表的代码。

import gspread_dataframe as gd
import gspread as gs
gc = gs.service_account(filename="your/cred/file.json")

def export_to_sheets(worksheet_name,df,mode='r'):
    ws = gc.open("SHEET_NAME").worksheet("worksheet_name")
    if(mode=='w'):
        ws.clear()
        gd.set_with_dataframe(worksheet=ws,dataframe=df,include_index=False,include_column_header=True,resize=True)
        return True
    elif(mode=='a'):
        ws.add_rows(df.shape[0])
        gd.set_with_dataframe(worksheet=ws,dataframe=df,include_index=False,include_column_header=False,row=ws.row_count+1,resize=False)
        return True
    else:
        return gd.get_as_dataframe(worksheet=ws)
    
df = pd.DataFrame.from_records([{'a': i, 'b': i * 2} for i in range(100)])
export_to_sheets("SHEET_NAME",df,'a')

  1. Write Mode: First clear existing worksheet => ws.clear() .Second using set_with_dataframe() uploading the dataframe, here note that resize=True , which strictily set the row and col in worksheet to df.shape.写入模式:首先清除现有工作表 => ws.clear() 。其次使用set_with_dataframe()上传数据帧,这里注意resize=True ,它将工作表中的行和列严格设置为 df.shape。 This will help later in append method.这将在后面的 append 方法中有所帮助。
  2. Append Mode: First, add rows according to the dataframe.追加模式:首先,根据数据框添加行。 Second setting the parameter resize=False as we are adding rows and row=ws.row_count+1 anchoring its row value for append.第二次设置参数resize=False因为我们正在添加行和row=ws.row_count+1锚定其行值以进行追加。
  3. Read Mode(Default): returns a dataframe读取模式(默认):返回一个数据帧

I was facing the same problem, here's what I did converted the dataframe into list and used gspread's append_rows()我遇到了同样的问题,这就是我将数据框转换为列表并使用 gspread 的append_rows()

    gc = gspread.service_account(filename="credentials.json")
    sh = gc.open_by_key('<your_key>')
    ws = sh.sheet1
    
    ##data is the original data frame
    data_list = data.values.tolist()
    
    ws.append_rows(data_list)

The following approach, using gspread , may help one understand the procedures and solve the problem以下使用gspread的方法可能有助于理解程序并解决问题

  1. Install the libraries in your environment.在您的环境中安装这些库。

  2. Import the libraries in the script导入脚本中的库

    import pandas as pd import gspread from gspread_dataframe import set_with_dataframe
  3. Create credentials in Google API console .Google API 控制台中创建凭据。

  4. Add the following to the script, to access the Google Sheet将以下内容添加到脚本中,以访问 Google 表格

    gc = gspread.service_account(filename='GoogleAPICredentials.json') sh = gc.open_by_key('GoogleSheetID')

Assuming one wants to add to the first sheet, use 0 in get_worksheet (for the second sheet use 1, and so on)假设要添加到第一个工作表,在get_worksheet中使用0 (第二个工作表使用 1,依此类推)

worksheet = sh.get_worksheet(0)
  1. Then, in order to export the dataframe, considering that the dataframe name is df , to a Google Sheet然后,为了导出数据框,考虑到数据框名称是df ,到谷歌表

    set_with_dataframe(worksheet, df)

I came up with the following solution.我想出了以下解决方案。 It does not overwrite current data but just appends entire pandas DataFrame df to the end of Sheet with name sheet in the Spreadsheet with the name spread_sheet .它不会覆盖当前数据,而只是将整个 pandas DataFrame df附加到 Sheet 的末尾,并在电子表格中使用名为spread_sheet的名称sheet

import gspread
from google.auth.transport.requests import AuthorizedSession
from oauth2client.service_account import ServiceAccountCredentials

def append_df_to_gs(df, spread_sheet:str, sheet_name:str):
    scopes = [
        'https://spreadsheets.google.com/feeds',
        'https://www.googleapis.com/auth/drive',
    ]
    credentials = ServiceAccountCredentials.from_json_keyfile_name(
        path_to_credentials,
        scopes=scopes
    )
    gsc = gspread.authorize(credentials)
    sheet = gsc.open(spread_sheet)
    params = {'valueInputOption': 'USER_ENTERED'}
    body = {'values': df.values.tolist()}
    sheet.values_append(f'{sheet_name:str}!A1:G1', params, body)

For params valueInputOption please consult this .有关参数valueInputOption请参阅 I used USER_ENTERED here as I needed some formulas to be valid once I append the data to Google Sheets.我在这里使用USER_ENTERED ,因为一旦我将数据附加到 Google 表格,我需要一些公式才能生效。

ws = gc.open("sheet title").worksheet("Sheet1")

gd.set_with_dataframe(ws, dataframe)

#simply transform your dataframe to google sheet #simply 将您的数据框转换为谷歌表格

I came up with the following solution using try/catch statement, in case the spreadsheet doesn't exsit he will create it for you and set the dataframe otherwise he will append it.我使用 try/catch 语句提出了以下解决方案,如果电子表格不存在,他会为您创建它并设置 dataframe,否则他会设置 append。

def load_to_sheet(conn_sheet, spreadsheet_name, df):
try:
    worksheet = conn_sheet.worksheet(spreadsheet_name)
    worksheet.add_rows(df.shape[0])
    set_with_dataframe(worksheet=worksheet, row=worksheet.row_count, dataframe=df, include_index=False,
                       include_column_header=False,
                       resize=False)
except Exception:
    worksheet = conn_sheet.add_worksheet(title=spreadsheet_name, rows=100, cols=100)
    set_with_dataframe(worksheet=worksheet, dataframe=df, include_index=False, include_column_header=True,
                       resize=True)

以下不需要 gspread 以外的外部库:

worksheet.update([dataframe.columns.values.tolist()] + dataframe.values.tolist())

如果 Google 电子表格采用 .csv 格式,那么您可以使用 df.to_csv() 将 pandas 数据帧转换为 csv 并以该格式保存

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM