[英]How to write to an existing excel file without overwriting data (using pandas)?
I use pandas to write to excel file in the following fashion:我使用 pandas 按以下方式写入 excel 文件:
import pandas
writer = pandas.ExcelWriter('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()
Masterfile.xlsx already consists of number of different tabs. Masterfile.xlsx 已经包含许多不同的选项卡。 However, it does not yet contain "Main".
但是,它还不包含“Main”。
Pandas correctly writes to "Main" sheet, unfortunately it also deletes all other tabs. Pandas 正确写入“主”工作表,不幸的是它也删除了所有其他选项卡。
Pandas docs says it uses openpyxl for xlsx files. Pandas 文档说它对 xlsx 文件使用 openpyxl。 Quick look through the code in
ExcelWriter
gives a clue that something like this might work out:快速浏览
ExcelWriter
的代码会发现这样的事情可能会奏效:
import pandas
from openpyxl import load_workbook
book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
writer.save()
UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel()
and pd.ExcelWriter()
have been changed - a new if_sheet_exists
parameter has been introduced, which has invalidated the function below.更新:从 Pandas 1.3.0 开始,以下函数将无法正常工作,因为函数
DataFrame.to_excel()
和pd.ExcelWriter()
已更改 - 引入了新的if_sheet_exists
参数,该参数使下面的函数无效。
Here you can find an updated version of the append_df_to_excel()
, which is working for Pandas 1.3.0+.在这里您可以找到
append_df_to_excel()
的更新版本,它适用于 Pandas 1.3.0+。
Here is a helper function:这是一个辅助函数:
import os
from openpyxl import load_workbook
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
@param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
@param df: DataFrame to save to workbook
@param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
@param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
@param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
@param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
@return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df)
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
With openpyxl
version 2.4.0
and pandas
version 0.19.2
, the process @ski came up with gets a bit simpler:随着
openpyxl
版本2.4.0
和pandas
版本0.19.2
,@ski想出的过程变得简单一点:
import pandas
from openpyxl import load_workbook
with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
writer.book = load_workbook('Masterfile.xlsx')
data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!
Starting in pandas 0.24 you can simplify this with the mode
keyword argument of ExcelWriter
:从 pandas 0.24 开始,您可以使用
ExcelWriter
的mode
关键字参数简化此mode
:
import pandas as pd
with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer:
data_filtered.to_excel(writer)
I know this is an older thread, but this is the first item you find when searching, and the above solutions don't work if you need to retain charts in a workbook that you already have created.我知道这是一个较旧的线程,但这是您在搜索时找到的第一个项目,如果您需要在已创建的工作簿中保留图表,则上述解决方案不起作用。 In that case, xlwings is a better option - it allows you to write to the excel book and keeps the charts/chart data.
在这种情况下,xlwings 是更好的选择 - 它允许您写入 Excel 书籍并保留图表/图表数据。
simple example:简单的例子:
import xlwings as xw
import pandas as pd
#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5
#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')
ws = wb.sheets['chartData']
ws.range('A1').options(index=False).value = df
wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')
xw.apps[0].quit()
Old question, but I am guessing some people still search for this - so...老问题,但我猜有些人仍在寻找这个 - 所以......
I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option.我觉得这个方法很好,因为所有的工作表都被加载到一个由 Pandas 使用 sheetname=None 选项创建的工作表名称和数据框对的字典中。 It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict.
在将电子表格读入 dict 格式和从 dict 写回之间添加、删除或修改工作表很简单。 For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.
对我来说,xlsxwriter 在速度和格式方面比 openpyxl 更适合这项特定任务。
Note: future versions of pandas (0.21.0+) will change the "sheetname" parameter to "sheet_name".注意:pandas (0.21.0+) 的未来版本会将“sheetname”参数更改为“sheet_name”。
# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
sheetname=None)
# all worksheets are accessible as dataframes.
# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']
# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df
# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe
# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
engine='xlsxwriter',
datetime_format='yyyy-mm-dd',
date_format='yyyy-mm-dd') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
For the example in the 2013 question:对于 2013 年问题中的示例:
ws_dict = pd.read_excel('Masterfile.xlsx',
sheetname=None)
ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]
with pd.ExcelWriter('Masterfile.xlsx',
engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name)
The solution of @MaxU is not working for the updated version of python and related packages. @MaxU 的解决方案不适用于更新版本的python 和相关包。 It raises the error: "zipfile.BadZipFile: File is not a zip file"
它引发错误: “zipfile.BadZipFile:文件不是 zip 文件”
I generated a new version of the function that works fine with the updated version of python and related packages and tested with python: 3.9 |我生成了一个新版本的函数,可以很好地与更新版本的 python 和相关包配合使用,并使用 python 进行测试:3.9 | openpyxl: 3.0.6 |
openpyxl: 3.0.6 | pandas: 1.2.3
熊猫:1.2.3
In addition I added more features to the helper function:此外,我为辅助函数添加了更多功能:
Here the function:这里的功能:
import pandas as pd
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
resizeColumns: default = True . It resize all columns based on cell content width
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''
Returns: None
*******************
CONTRIBUTION:
Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
Features of the new helper function:
1) Now it works with python 3.9 and latest versions of pandas and openpxl
---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
3) You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
*******************
"""
from openpyxl import load_workbook
from string import ascii_uppercase
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
try:
f = open(filename)
# Do something with the file
except IOError:
# print("File not accessible")
wb = Workbook()
ws = wb.active
ws.title = sheet_name
wb.save(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
# startrow = -1
startrow = 0
if startcol is None:
startcol = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)
if resizeColumns:
ws = writer.book[sheet_name]
def auto_format_cell_width(ws):
for letter in range(1,ws.max_column):
maximum_value = 0
for cell in ws[get_column_letter(letter)]:
val_to_check = len(str(cell.value))
if val_to_check > maximum_value:
maximum_value = val_to_check
ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2
auto_format_cell_width(ws)
# save the workbook
writer.save()
Example Usage:示例用法:
# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue'],
'NaNcheck': [float('NaN'), 1, float('NaN')],
})
# EDIT YOUR PATH FOR THE EXPORT
filename = r"C:\DataScience\df.xlsx"
# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN
def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
try:
master_book = load_workbook(master_file_path)
master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
master_writer.book = master_book
master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
header=None,
index_col=None)
current_frames.to_excel(master_writer, sheet_name, index=None, header=False)
master_writer.save()
except Exception as e:
raise e
This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.这工作得很好,唯一的问题是主文件(我们添加新工作表的文件)的格式丢失了。
writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)
“keep_date_col”希望对你有帮助
book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()
import pandas as pd
from openpyxl import load_workbook
def write_to_excel(df, file):
try:
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, **kwds)
writer.save()
except FileNotFoundError as e:
df.to_excel(file, **kwds)
df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])
Solution by @MaxU worked very well. @MaxU 的解决方案效果很好。 I have just one suggestion:
我只有一个建议:
If truncate_sheet=True is specified than "startrow" should NOT be retained from existing sheet.如果指定了 truncate_sheet=True,则不应从现有工作表中保留“startrow”。 I suggest:
我建议:
if startrow is None and sheet_name in writer.book.sheetnames:
if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
startrow = writer.book[sheet_name].max_row
I used the answer described here我使用了这里描述的答案
from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()
I'd reccommend using xlwings ( https://docs.xlwings.org/en/stable/api.html ), it is really powerful for this application... This is how I use it:我建议使用 xlwings ( https://docs.xlwings.org/en/stable/api.html ),它对这个应用程序非常强大......这就是我使用它的方式:
import xlwings as xw
import pandas as pd
import xlsxwriter
# function to get the active workbook
def getActiveWorkbook():
try:
# logic from xlwings to grab the current excel file
activeWb = xw.books.active
except:
# print error message if unable to get the current workbook
print('Unable to grab the current Workbook')
pause()
exitProgram()
else:
return activeWb
# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
return str(lastRow), lastCol + str(lastRow)
activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])
# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')
This seems to work very well for my applications because.xlsm workbooks can be very tricky.这似乎对我的应用程序非常有效,因为 .xlsm 工作簿可能非常棘手。 You can execute this as a python script or turn it into and executable with pyinstaller and then run the.exe through an excel macro.
您可以将其作为 python 脚本执行,也可以使用 pyinstaller 将其转换为可执行文件,然后通过 excel 宏运行.exe。 You can also call VBA macros from Python using xlwings which is very useful.
您还可以使用 xlwings 从 Python 调用 VBA 宏,这非常有用。
You can write to an existing Excel file without overwriting data using pandas by using the pandas.DataFrame.to_excel() method and specifying the mode parameter as 'a' (append mode).通过使用 pandas.DataFrame.to_excel() 方法并将模式参数指定为“a”(追加模式),您可以写入现有的 Excel 文件,而不会使用 pandas 覆盖数据。
Here's an example:这是一个例子:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.