简体   繁体   English

如何在不覆盖数据的情况下写入现有的 excel 文件(使用熊猫)?

[英]How to write to an existing excel file without overwriting data (using pandas)?

I use pandas to write to excel file in the following fashion:我使用 pandas 按以下方式写入 excel 文件:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx already consists of number of different tabs. Masterfile.xlsx 已经包含许多不同的选项卡。 However, it does not yet contain "Main".但是,它还不包含“Main”。

Pandas correctly writes to "Main" sheet, unfortunately it also deletes all other tabs. Pandas 正确写入“主”工作表,不幸的是它也删除了所有其他选项卡。

Pandas docs says it uses openpyxl for xlsx files. Pandas 文档说它对 xlsx 文件使用 openpyxl。 Quick look through the code in ExcelWriter gives a clue that something like this might work out:快速浏览ExcelWriter的代码会发现这样的事情可能会奏效:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

UPDATE: Starting from Pandas 1.3.0 the following function will not work properly, because functions DataFrame.to_excel() and pd.ExcelWriter() have been changed - a new if_sheet_exists parameter has been introduced, which has invalidated the function below.更新:从 Pandas 1.3.0 开始,以下函数将无法正常工作,因为函数DataFrame.to_excel()pd.ExcelWriter()已更改 - 引入了新的if_sheet_exists参数,该参数使下面的函数无效。

Here you can find an updated version of the append_df_to_excel() , which is working for Pandas 1.3.0+.在这里您可以找到append_df_to_excel()更新版本,它适用于 Pandas 1.3.0+。


Here is a helper function:这是一个辅助函数:

import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

Tested with the following versions:使用以下版本进行测试:

  • Pandas 1.2.3熊猫 1.2.3
  • Openpyxl 3.0.5 Openpyxl 3.0.5

With openpyxl version 2.4.0 and pandas version 0.19.2 , the process @ski came up with gets a bit simpler:随着openpyxl版本2.4.0pandas版本0.19.2 ,@ski想出的过程变得简单一点:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

Starting in pandas 0.24 you can simplify this with the mode keyword argument of ExcelWriter :从 pandas 0.24 开始,您可以使用ExcelWritermode关键字参数简化此mode

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

I know this is an older thread, but this is the first item you find when searching, and the above solutions don't work if you need to retain charts in a workbook that you already have created.我知道这是一个较旧的线程,但这是您在搜索时找到的第一个项目,如果您需要在已创建的工作簿中保留图表,则上述解决方案不起作用。 In that case, xlwings is a better option - it allows you to write to the excel book and keeps the charts/chart data.在这种情况下,xlwings 是更好的选择 - 它允许您写入 Excel 书籍并保留图表/图表数据。

simple example:简单的例子:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

There is a better solution in pandas 0.24: pandas 0.24 中有一个更好的解决方案:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

before:前:

在此处输入图片说明

after:后:

在此处输入图片说明

so upgrade your pandas now:所以现在升级你的熊猫:

pip install --upgrade pandas

Old question, but I am guessing some people still search for this - so...老问题,但我猜有些人仍在寻找这个 - 所以......

I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option.我觉得这个方法很好,因为所有的工作表都被加载到一个由 Pandas 使用 sheetname=None 选项创建的工作表名称和数据框对的字典中。 It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict.在将电子表格读入 dict 格式和从 dict 写回之间添加、删除或修改工作表很简单。 For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.对我来说,xlsxwriter 在速度和格式方面比 openpyxl 更适合这项特定任务。

Note: future versions of pandas (0.21.0+) will change the "sheetname" parameter to "sheet_name".注意:pandas (0.21.0+) 的未来版本会将“sheetname”参数更改为“sheet_name”。

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

For the example in the 2013 question:对于 2013 年问题中的示例:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

The solution of @MaxU is not working for the updated version of python and related packages. @MaxU 的解决方案不适用于更新版本的python 和相关包。 It raises the error: "zipfile.BadZipFile: File is not a zip file"它引发错误: “zipfile.BadZipFile:文件不是 zip 文件”

I generated a new version of the function that works fine with the updated version of python and related packages and tested with python: 3.9 |我生成了一个新版本的函数,可以很好地与更新版本的 python 和相关包配合使用,并使用 python 进行测试:3.9 | openpyxl: 3.0.6 | openpyxl: 3.0.6 | pandas: 1.2.3熊猫:1.2.3

In addition I added more features to the helper function:此外,我为辅助函数添加了更多功能:

  1. Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")现在它根据单元格内容宽度调整所有列的大小并且所有变量都将可见(参见“resizeColumns”)
  2. You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")您可以处理 NaN,如果您希望 NaN 显示为 NaN 或空单元格(请参阅“na_rep”)
  3. Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0增加了“startcol”,你可以决定从特定的列开始写,否则会从col = 0开始

Here the function:这里的功能:

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

Example Usage:示例用法:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:\DataScience\df.xlsx"   

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE 
  
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN
def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.这工作得很好,唯一的问题是主文件(我们添加新工作表的文件)的格式丢失了。

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

“keep_date_col”希望对你有帮助

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

Method:方法:

  • Can create file if not present如果不存在,可以创建文件
  • Append to existing excel as per sheet name根据工作表名称附加到现有的 excel
import pandas as pd
from openpyxl import load_workbook

def write_to_excel(df, file):
    try:
        book = load_workbook(file)
        writer = pd.ExcelWriter(file, engine='openpyxl') 
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
        df.to_excel(writer, **kwds)
        writer.save()
    except FileNotFoundError as e:
        df.to_excel(file, **kwds)

Usage:用法:

df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

Solution by @MaxU worked very well. @MaxU 的解决方案效果很好。 I have just one suggestion:我只有一个建议:

If truncate_sheet=True is specified than "startrow" should NOT be retained from existing sheet.如果指定了 truncate_sheet=True,则不应从现有工作表中保留“startrow”。 I suggest:我建议:

        if startrow is None and sheet_name in writer.book.sheetnames:
            if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
                startrow = writer.book[sheet_name].max_row

I used the answer described here我使用了这里描述的答案

from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()

I'd reccommend using xlwings ( https://docs.xlwings.org/en/stable/api.html ), it is really powerful for this application... This is how I use it:我建议使用 xlwings ( https://docs.xlwings.org/en/stable/api.html ),它对这个应用程序非常强大......这就是我使用它的方式:

import xlwings as xw
import pandas as pd
import xlsxwriter

# function to get the active workbook
def getActiveWorkbook():
    try:
        # logic from xlwings to grab the current excel file
        activeWb = xw.books.active
    except:
        # print error message if unable to get the current workbook
        print('Unable to grab the current Workbook')
        pause()
        exitProgram()
    else:
        return activeWb

# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
    lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
    lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
    return str(lastRow), lastCol + str(lastRow)

activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])

# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
    ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')

This seems to work very well for my applications because.xlsm workbooks can be very tricky.这似乎对我的应用程序非常有效,因为 .xlsm 工作簿可能非常棘手。 You can execute this as a python script or turn it into and executable with pyinstaller and then run the.exe through an excel macro.您可以将其作为 python 脚本执行,也可以使用 pyinstaller 将其转换为可执行文件,然后通过 excel 宏运行.exe。 You can also call VBA macros from Python using xlwings which is very useful.您还可以使用 xlwings 从 Python 调用 VBA 宏,这非常有用。

You can write to an existing Excel file without overwriting data using pandas by using the pandas.DataFrame.to_excel() method and specifying the mode parameter as 'a' (append mode).通过使用 pandas.DataFrame.to_excel() 方法并将模式参数指定为“a”(追加模式),您可以写入现有的 Excel 文件,而不会使用 pandas 覆盖数据。

Here's an example:这是一个例子:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 excel 文件的特定单元格中写入数据而不覆盖数据(使用熊猫)? - How do you write data in a specific cell in an excel file without overwriting data (using pandas)? 如何使用 pandas 写入现有 excel 文件而不覆盖现有数据 - How to write to an existing excel file without over-writing existing data using pandas 熊猫:如何在同一工作表的现有xlsx文件中写入数据而不会覆盖旧数据 - pandas: how to write data in a existing xlsx file in the same sheet without overwriting the old data 如何通过使用 pandas 在工作表中覆盖来在现有 excel 中写入 json 数据? - How can I write json data in an existing excel by overwriting in a sheet using pandas? 如何使用pandas将数据写入现有的excel文件? - How to write data to existing excel file using pandas? 如何使用 Pandas 将数据写入 Excel 中的现有文件? - How do I write data to an existing file in Excel using Pandas? pandas 写入 excel 覆盖现有的 excel 行 - pandas write to excel overwriting the existing excel rows 如何在不覆盖的情况下将数据写入Excel - How to write data to excel without overwriting 在不覆盖数据的情况下向现有 excel 文件添加新行(Python) - Add new row to existing excel file without overwriting the data (Python) 如何在不使用pandas更改文件中的现有数据的情况下将新列附加到Excel文件? - How to append a new column to an Excel file without changing the existing data on the file using pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM