简体   繁体   中英

Combine Multiple Workbooks into One

I have a various amount of input .xlsx documents that contain 12 sheets (all sheets have the same name within each .xlsx document). I need to combine these into one .xlsx document while retaining the original sheets' names, but the data from all documents for each sheets appended to the original sheets.

For example, see my original output:

Original Output

Desired Output

Currently, I am not adding the inputFile name anywhere and just trying to merge into one workbook. However, I keep receiving an error:

error

def createEmptyWorkbook(self, outputFileName):
    logging.info('creating empty workbook: %s' % (outputFileName))
    # create empty workbook
    ncoa_combined_report = openpyxl.Workbook()

    # save new file
    ncoa_combined_report.save(outputFileName)

    ncoa_combined_report = openpyxl.load_workbook(filename=outputFileName)#, data_only=True)

    return ncoa_combined_report

def combine_sheets(self, inputFiles):
    logging.info('combining ncoa reports to one workbook')

    # new output files
    outputFile = os.path.join(self.processingDir, 'combined_ncoa_report.xlsx')

    # create empty workbook
    ncoa_combined_report = self.createEmptyWorkbook(outputFile)

    # get a list of sheet names created in output file
    outputSheetNames = ncoa_combined_report.sheetnames

    for inputFile in inputFiles:
        logging.info('reading ncoa report: %s' % (os.path.split(inputFile)[-1]))
        # load entire input file into memory
        input_wb = openpyxl.load_workbook(filename = inputFile)#, data_only=True)

        # get sheet name values in inputFile 
        sheets = input_wb.sheetnames

        # iterate worksheets in input file
        for worksheet in input_wb.worksheets:
            outputSheetMaxRow = 0
            currentSheet = ''
            row = ''
            column = ''

            logging.info('working on sheet: %s' % (worksheet.title))
            # check if sheet exist in output file and add if neccissary
            if not worksheet.title in outputSheetNames:
                logging.info('creating sheet: %s' % (worksheet.title))
                currentSheet = ncoa_combined_report.create_sheet(worksheet.title)
            else:
                currentSheet = worksheet.title

            ## check if default sheet name is in output
            #if 'Sheet' in outputSheetNames:
            #    ncoa_combined_report.remove_sheet(ncoa_combined_report.get_sheet_by_name('Sheet'))

            outputSheetMaxRow = currentSheet.max_row

            for row, entry in enumerate(worksheet, start=1):
                logging.info('working on row: %s' % (row))
                for cell in entry:
                    try:
                        outputSheetMaxRow = currentSheet.max_row
                        # add cell value to output file
                        #currentSheet[cell.coordinate].value
                        currentSheet.cell(row=row+outputSheetMaxRow, column=cell.column).value = cell.value #, value=cell
                    except:
                        logging.critical('could not add row:%s, cell:%s' % (row, entry))
                        raise ValueError('could not add row:%s, cell:%s' % (row, entry))

        # save new file
        ncoa_combined_report.save(outputFile)

I am not sure why I am getting the error or what I need to update to correct it. Any guidance is appreciated.

I think I found the issue with this portion of the code. I found where you can get the xy, col, and row from openpyxl.utils, which allowed me to insert at the append at the correct locations. Hopefully this will help someone else in the future.

         for line, entry in enumerate(worksheet, start=1):
                #logging.info('working on row: %s' % (row))
                for cell in entry:
                    #try:
                    xy = openpyxl.utils.coordinate_from_string(cell.coordinate) # returns ('A',4)
                    col = openpyxl.utils.column_index_from_string(xy[0]) # returns 1
                    rowCord = xy[1]
                    # add cell value to output file
                    #currentSheet[cell.coordinate].value
                    if line == 1 and inputFileCount == 1:
                        currentSheet.cell(row=1, column=1).value = 'Project'
                        currentSheet.cell(row=1, column=2).value = os.path.split(inputFile)[-1]
                    if line == 1 and inputFileCount > 1:
                        currentSheet.cell(row=outputSheetMaxRow + 2, column=1).value = 'Project'
                        currentSheet.cell(row=outputSheetMaxRow + 2, column=2).value = os.path.split(inputFile)[-1]
                    else:
                        currentSheet.cell(row=outputSheetMaxRow + rowCord + 1, column=col).value = cell.value #, value=cell

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM