I am super stuck since a day or two and give up on this. I am new to using python with excel.
Here is my scenario ; I am planning to write a pandas dataframe to an existing excel sheet. The sheet has 50 columns in it. 2 of the columns are derived (formula columns developed from other columns through computations) and fall in between at position 48 and 50 respectively among those 50 columns. Hence, my dataframe should write to this excel sheet skipping position 48th column and 50th column. I am using win32com and pandas to do my job.
Problem statement :
But as I write to dataframe;
only the first record from dataframe gets written for entire excel sheet range. why am I not pasting entire pandas series got from column of dataframe?
how can I handle the "None" and "NaN" set to blanks '' for excel in this code? (optional)
Code : The below code is a snippet (from entire code) of how I am writing my dataframe to excel.
"Report_data" is the pandas dataframe. This is also the name of sheet in excel I am writing to.
Excel_Template_File has the file path for my excel template file where the sheet "Report Data" is for me to write my dataframe from python
excel_app = client.dynamic.Dispatch("Excel.Application") # Initialize instance
excel_app.Interactive = False
excel_app.Visible = False
wb = excel_app.Workbooks.Open(Excel_Template_File)
ws = wb.Worksheets('Report Data')
for col_idx in range(0,len(Report_Data.columns)):
col_lst = Report_Data.columns.values.tolist()
if col_lst[col_idx] in [col_lst[-1], col_lst[-3]]:
continue;
else:
print(col_lst[col_idx])
col_vals = Report_Data.iloc[:,col_idx] # Copy values of column from dataframe as series
print('mapping to cell locations...')
xl_col_idx = col_idx + 1
try: # Write column by column to avoid formula columns
ws.Range(ws.Cells(2, xl_col_idx),
ws.Cells(1+len(col_vals),xl_col_idx)).Value = col_vals.values
except pywintypes.com_error:
print("Error")
wb.SaveAs('C:\\somepath\\Excel_'+time.strftime("%Y%m%d-%H%M%S")+'.xlsx') # Save our work
wb.Close(True)
excel_app.quit()
The try block is the one that does writing stuff to excel at the given range.
Validations done :
I tried df.to_excel() but it wipes out my entire excel template clean which I cannot afford since there are more than 30-40 sheets in this excel made of Pivot tables and charts generated from this "Report Data" sheet
Apart from pywin32com I am unable to leverage any other excel library as there are multiple excel files from where I am pulling the data to make pandas dataframe to be finally written to sheet "Report Data" in excel. As the excels I am pulling from are located on network drive win32com suites it. openpyxl command load_workbok() too takes forever to open in my case.
The dataframe has correct data as I checked it by printing it with.head(). Thus, excels pulled have been concatenated and merged correctly.
The file size is about 200 MB.
Conclusion & expected output :
Thus kindly assist in dumping my pandas series(or array) to respective column positions in excel. Writing column by column to excel from df
Since the above code neither erases the derived column formulas at position 48 and 50 and neither does it wipes of excel clean as in case of to_excel
The issue is that the Range.Value
property can take a 1-D vector of values or a 2-D array. If Value
receives a 1-D vector, Excel assumes it is a single row (NOT a column). To set the values by column, you need to convert the vector to an array. A simplified example:
import pandas as pd
import win32com.client as wc
df = pd.DataFrame([[1,4,7],[2,5,8],[3,6,9]],columns=['A','B','C'])
print(df.head())
xl = wc.Dispatch('Excel.Application')
xl.Visible=True
wb = xl.Workbooks.Add()
ws = wb.Worksheets(1)
for col_num in range(0,len(df.columns)):
#Convert 1D vector to 2D array
vals = [[v] for v in df.iloc[:,col_num].values]
ws.Range(ws.Cells(1,col_num+1),ws.Cells(len(vals),col_num+1)).Value = vals
input("Press Enter to continue...")
wb.Close(False)
xl.Quit()
Python output:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Press Enter to continue...
As an aside, it might be more efficient to set the values as two blocks, ie dataframe cols 0-46 first df.iloc[:,range(0,47)].values
, then col 48 separately. The values
from the first block will already be a 2-D array.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.