简体   繁体   中英

Write pandas dataframe column by column to existing excel template skipping excel sheet columns that have formulas in it

I am super stuck since a day or two and give up on this. I am new to using python with excel.

Here is my scenario ; I am planning to write a pandas dataframe to an existing excel sheet. The sheet has 50 columns in it. 2 of the columns are derived (formula columns developed from other columns through computations) and fall in between at position 48 and 50 respectively among those 50 columns. Hence, my dataframe should write to this excel sheet skipping position 48th column and 50th column. I am using win32com and pandas to do my job.

Problem statement :

But as I write to dataframe;

  1. only the first record from dataframe gets written for entire excel sheet range. why am I not pasting entire pandas series got from column of dataframe?

  2. how can I handle the "None" and "NaN" set to blanks '' for excel in this code? (optional)

Code : The below code is a snippet (from entire code) of how I am writing my dataframe to excel.

  1. "Report_data" is the pandas dataframe. This is also the name of sheet in excel I am writing to.

  2. Excel_Template_File has the file path for my excel template file where the sheet "Report Data" is for me to write my dataframe from python

excel_app = client.dynamic.Dispatch("Excel.Application") # Initialize instance
excel_app.Interactive = False
excel_app.Visible = False

wb = excel_app.Workbooks.Open(Excel_Template_File)
ws = wb.Worksheets('Report Data')

for col_idx in range(0,len(Report_Data.columns)):
    col_lst = Report_Data.columns.values.tolist()
    
    if col_lst[col_idx] in [col_lst[-1], col_lst[-3]]:
        continue;
    else:
        print(col_lst[col_idx])
        col_vals = Report_Data.iloc[:,col_idx] # Copy values of column from dataframe as series
        print('mapping to cell locations...')
        
        xl_col_idx = col_idx + 1
        try: # Write column by column to avoid formula columns
            ws.Range(ws.Cells(2, xl_col_idx), 
            ws.Cells(1+len(col_vals),xl_col_idx)).Value = col_vals.values
        except pywintypes.com_error:
            print("Error")

wb.SaveAs('C:\\somepath\\Excel_'+time.strftime("%Y%m%d-%H%M%S")+'.xlsx') # Save our work
wb.Close(True)
excel_app.quit()

The try block is the one that does writing stuff to excel at the given range.

Validations done :

  1. I tried df.to_excel() but it wipes out my entire excel template clean which I cannot afford since there are more than 30-40 sheets in this excel made of Pivot tables and charts generated from this "Report Data" sheet

  2. Apart from pywin32com I am unable to leverage any other excel library as there are multiple excel files from where I am pulling the data to make pandas dataframe to be finally written to sheet "Report Data" in excel. As the excels I am pulling from are located on network drive win32com suites it. openpyxl command load_workbok() too takes forever to open in my case.

  3. The dataframe has correct data as I checked it by printing it with.head(). Thus, excels pulled have been concatenated and merged correctly.

  4. The file size is about 200 MB.

Conclusion & expected output :

Thus kindly assist in dumping my pandas series(or array) to respective column positions in excel. Writing column by column to excel from df

Since the above code neither erases the derived column formulas at position 48 and 50 and neither does it wipes of excel clean as in case of to_excel

The issue is that the Range.Value property can take a 1-D vector of values or a 2-D array. If Value receives a 1-D vector, Excel assumes it is a single row (NOT a column). To set the values by column, you need to convert the vector to an array. A simplified example:

import pandas as pd
import win32com.client as wc

df = pd.DataFrame([[1,4,7],[2,5,8],[3,6,9]],columns=['A','B','C'])

print(df.head())

xl = wc.Dispatch('Excel.Application')
xl.Visible=True

wb = xl.Workbooks.Add()
ws = wb.Worksheets(1)

for col_num in range(0,len(df.columns)):
    #Convert 1D vector to 2D array
    vals = [[v] for v in df.iloc[:,col_num].values]
    ws.Range(ws.Cells(1,col_num+1),ws.Cells(len(vals),col_num+1)).Value = vals

input("Press Enter to continue...")

wb.Close(False)
xl.Quit()

Python output:

   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
Press Enter to continue...

Excel sheet: 在此处输入图像描述

As an aside, it might be more efficient to set the values as two blocks, ie dataframe cols 0-46 first df.iloc[:,range(0,47)].values , then col 48 separately. The values from the first block will already be a 2-D array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM