如何使用 pandas 和 numpy 比較兩個具有多個選項卡的 excel 工作簿？

Question

我有兩個具有多個選項卡的 xlsx 文件。 我需要根據選項卡名稱比較每個選項卡中的值。 （例如，file1 中的 sheet1 需要與 file2 中的 sheet1 進行比較，依此類推）。 當我使用下面的代碼時，它只會比較並寫入第一張紙。 請幫我弄清楚為什么沒有比較所有選項卡。

import pandas as pd
import numpy as np

df1 = pd.read_excel('test_1.xlsx', sheet_name=None)
df2 = pd.read_excel('test_2.xlsx', sheet_name=None)

with pd.ExcelWriter('./Excel_diff.xlsx') as writer:
    for sheet, df1 in df1.items():
        # check if sheet is in the other Excel file
        if sheet in df2:
            df2 = df2[sheet]
            comparison_values = df1.values == df2.values

            print(comparison_values)

            rows, cols = np.where(comparison_values == False)
            for item in zip(rows, cols):
                df1.iloc[item[0], item[1]] = '{} → {}'.format(df1.iloc[item[0], item[1]], df2.iloc[item[0], item[1]])

            df1.to_excel(writer, sheet_name=sheet, index=False, header=True)

test_1 文件設置如下。

工作表1

|test 1|test 2|test 3|
|------|------|------|
|1     |1     |1     |
|1     |1     |1     |
|1     |1     |1     |

工作表2

|test 1|test 2|test 3|
|------|------|------|
|3     |3     |3     |
|3     |3     |3     |
|3     |3     |3     |

test_2 文件設置如下。

工作表1

|test 1|test 2|test 3|
|------|------|------|
|2     |2     |2     |
|2     |2     |2     |
|2     |2     |2     |

工作表2

|test 1|test 2|test 3|
|------|------|------|
|4     |4     |4     |
|4     |4     |4     |
|4     |4     |4     |

這是我使用上面的代碼得到的 output。

工作表1

|test 1|test 2|test 3|
|------|------|------|
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |

如果我將 df1.to_excel function 與“if”語句對齊，這是我得到的 output。

工作表1

|test 1|test 2|test 3|
|------|------|------|
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |

工作表2

|test 1|test 2|test 3|
|------|------|------|
|3     |3     |3     |
|3     |3     |3     |
|3     |3     |3     |

這是我想要的 output，它按工作表名稱顯示值的差異。

工作表1

|test 1|test 2|test 3|
|------|------|------|
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |
|1 → 2 |1 → 2 |1 → 2 |

工作表2

|test 1|test 2|test 3|
|------|------|------|
|3 → 4 |3 → 4 |3 → 4 |
|3 → 4 |3 → 4 |3 → 4 |
|3 → 4 |3 → 4 |3 → 4 |

謝謝！

Answer 1

在一位同事的幫助下，我能夠使用 excel 工作表比較代碼解決問題。 在“if”循環中，df2 被覆蓋。 我在“if”循環中將名稱從 df2 更改為 df2sheet，現在它工作得很好。

import pandas as pd
import numpy as np

df1 = pd.read_excel('test_1.xlsx', sheet_name=None)
df2 = pd.read_excel('test_2.xlsx', sheet_name=None)

with pd.ExcelWriter('./Excel_diff.xlsx') as writer:
    for sheet, df1 in df1.items():
        # check if sheet is in the other Excel file
        if sheet in df2:
            df2sheet = df2[sheet]
            comparison_values = df1.values == df2sheet.values

            print(comparison_values)

            rows, cols = np.where(comparison_values == False)
            for item in zip(rows, cols):
                df1.iloc[item[0], item[1]] = '{} → {}'.format(df1.iloc[item[0], item[1]], df2sheet.iloc[item[0], item[1]])

            df1.to_excel(writer, sheet_name=sheet, index=False, header=True)

如何使用 pandas 和 numpy 比較兩個具有多個選項卡的 excel 工作簿？

問題描述

1 個解決方案

解決方案1
0 2022-05-02 19:47:39

如何使用 pandas 和 numpy 比較兩個具有多個選項卡的 excel 工作簿？

問題描述

1 個解決方案

解決方案1 0 2022-05-02 19:47:39

解決方案1
0 2022-05-02 19:47:39