简体   繁体   中英

How to compare column values of one excel file to the column values of another excel file in Python using openpyxl?

I am able to read column data of two excel files. Below is my code:-

from openpyxl import load_workbook

book = load_workbook("Book1.xlsx")
book2 = load_workbook("Book2.xlsx")

sheets = book['Sheet1']
anotherSheet = book2["sheet1"]

for val1 in sheets:
    print(val1[0].value)

print("\n\n\n\n")

for val2 in anotherSheet:
    print(val2[0].value)

I need to compare each value of Book1's column to every value of Book2's column. I am totally confused about how to perform the comparison. If the value matches then I can add another column and put "Yes" and if it doesn't then I can put "No". In other words, I just need to check if the values of Book1's Column exist in Book2's. Some help would be highly appreciated.

I don't know the full answer but I guess you can take the values on arrays and compare them one by one

Finally, figured out the solution.

First, we need to create 3 Lists to store values from book1, book2 and tempList to store matched values.

from openpyxl import load_workbook

book = load_workbook("Book1.xlsx")
book2 = load_workbook("Book2.xlsx")

sheets = book['Sheet1']
anotherSheet = book2["sheet1"]
book1_list = []
book2_list = []
tempList = []

Next, we also want to skip the heading of the columns and store in new variable.

skip_Head_of_anotherSheet = anotherSheet[2: anotherSheet.max_row]

Then iterate through sheets and append the values of your required column to their respective lists (in my case it was '0' which means the first column).

for val1 in sheets:
    book1_list.append(val1[0].value)

for val2 in skip_Head_of_anotherSheet:
    book2_list.append(val2[0].value)

Check for repetitions in your lists and remove any duplicate values.

book1_list = list(dict.fromkeys(book1_list))

Store the length of your lists for debugging purposes

length_of_firstList = len(book1_list)
length_of_secondList = len(book2_list)

Next, iterate through both the lists and check if any of them matches, then store the matched values to the tempList .

for i in book1_list:
    for j in book2_list:
        if i == j:
           tempList.append(j)
           #print(j)

Now, it's time to edit our excel sheet. We will iterate through matched values that are stored inside tempList and find those values that are inside the actual excel sheet. When we detect the same value, we will mark YES to the 4th Column of the excel sheet ie 'D' column by identifying the index of that particular row. Additionally, if the cells are blank on our 'D' column then we will mark NO .

for temp in tempList:
    for pointValue in skip_Head_of_anotherSheet:
        if temp == pointValue[0].value:
            anotherSheet.cell(column=4, row=pointValue[0].row, value="YES")
            #print(pointValue[0].row)

        if pointValue[3].value is None:
            anotherSheet.cell(column=4, row=pointValue[0].row, value="NO")

Finally, we will add a header to our newly populated column & save our excel sheet and print required information for debugging purposes.

anotherSheet.cell(column=4, row=1, value="PII")
book2.save("Book2.xlsx")

print("SUCCESSFULLY UPDATED THE EXCEL SHEET")
print("Length of First List = ", length_of_firstList)
print("Length of Second List = ", length_of_secondList)

I hope this will help someone with the same issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM