简体   繁体   English

匹配来自两个 .xlsx 文件的唯一列,如果匹配更新,则使用 openpyxl 追加

[英]Match unique column from two .xlsx files, if match update else append using openpyxl

I have two excel files, master and child.我有两个excel文件,master和child。 All the column names are same in both the files.两个文件中的所有列名都相同。

I want to match column C of both the files, if there is a match then update all the columns for that specific row and if there is no match append the row at the end of the Master file.我想匹配两个文件的C列,如果匹配,则更新该特定行的所有列,如果没有匹配,则在主文件的末尾追加该行。

I want to update data from child file to Master file based on this logic.我想根据这个逻辑将数据从子文件更新到主文件。 So far I am able to update the Master file by copying all the data from Child to Master, but the data in Master file is getting replaced from Child file for that specified range.到目前为止,我能够通过将所有数据从 Child 复制到 Master 来更新 Master 文件,但是 Master 文件中的数据正在从该指定范围的 Child 文件中替换。 Any help will be much appreciated.任何帮助都感激不尽。

import openpyxl 

Master = openpyxl.load_workbook("Master.xlsx")
Child = openpyxl.load_workbook("Child.xlsx")

Master_File = Master["Sheet1"]
Child_File = Child["Sheet1"]

Function to copy rows and columns from Child File从子文件复制行和列的函数

def copyRange(startCol, startRow, endCol, endRow, sheet):
    rangeSelected = []
    #Loop through selected rows------------------------------------------------
    for i in range(startRow, endRow + 1, 1):
        #Appends to row selected list--------------------------------------------
        rowSelected = []
        for j in range(startCol, endCol + 1, 1):
            rowSelected.append(sheet.cell(row=i, column=j).value)
        #Adds the rowSelected list and nests inside rangesSelected list------------------------------
        rangeSelected.append(rowSelected) 
    return rangeSelected

Function to paste all the data to Master File将所有数据粘贴到主文件的功能

def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving, copiedData):
    countRow = 0
    for i in range(startRow, endRow + 1, 1):
        countCol = 0
        for j in range(startCol, endCol + 1, 1):

            sheetReceiving.cell(row=i, column=j).value = copiedData[countRow][countCol]
            countCol += 1
        countRow += 1

Main function主功能

def createData():
    wb = Workbook()
    print("Your data is being Processed.....")
    selectedRange = copyRange(1,10,39,45, Child_File)
    pastingRange = pasteRange(1,10,39,45, Master_File, selectedRange)
    Master.save(r"Final.xlsx")
    print("Range copied and pasted")





Final = createData()

This demonstrates how you can take two dataframes (which could both be created from .read_excel() in pandas), set your index column as the column you want to match on, and update the original with matches from the second dataframe, then write to xlsx again.这演示了如何获取两个数据帧(它们都可以从 pandas 中的 .read_excel() 创建),将索引列设置为要匹配的列,并使用第二个数据帧中的匹配项更新原始数据,然后写入再次.xlsx。

import pandas as pd
#df = pd.read_excel('myfile1.xlsx')
df = pd.DataFrame({'C': [1, 2, 3],
                   'D': [400, 500, 600]})
#new_df = pd.read_excel('myfile2.xlsx')
new_df = pd.DataFrame({'C': [1, 2, 6],
                       'D': [7, 8, 9]})

df.set_index('C', inplace=True)
df.update(new_df.set_index('C'))

df.update(new_df)

df.reset_index().to_excel('updated.xlsx', index=False)

Output输出

    C   D
0   1   8.0
1   2   9.0
2   3   600.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM