在 python pandas 中組合多個 excel 文件時出現問題

Question

我對 python 編程很陌生。 我需要將 1000 多個文件合並到一個文件中。 每個文件有 3 張紙，我只需要從 sheet2 獲取數據並制作最終的 excel 文件。 我面臨一個問題，即從 sheet2 上的每個 excel 文件中的特定單元格中選擇一個值並創建一個列。 python 正在從第一個文件中選擇值並在該文件上創建一個列

    df = pd.DataFrame()
            
    for file in files:
        if file.endswith('.xlsm'):
            df = pd.read_excel(file, sheet_name=1, header=None) 
            df['REPORT_NO'] = df.iloc[1][4] #Report Number
            df['SUPPLIER'] = df.iloc[2][4] #Supplier
            df['REPORT_DATE'] = df.iloc[0][4] #Report Number
        df2 = df2.dropna(thresh=15)
        df2 = df.append(df, ignore_index=True)
        df = df.reset_index()
        del df['index']
    df2.to_excel('FINAL_FILES.xlsx')

我該如何解決這個問題，以便 python 可以從每個 excel 中獲取信息並將信息放在正確的行上。

Answer 1

我df.iloc[2][4]指的是第一張紙的第二行和第四列。 您已經使用sheet_name=1導入並且從未激活過其他工作表，盡管您提到所有.xlsm都有 3 張工作表。

II你的范圍可能是錯誤的。 為什么要在循環之外定義df ？ 如果每個文件都會改變，所以不需要外部文件。 循環中的所有信息都應在循環的下一次迭代之前放入您的df2中。

III 你檢查append是加行還是加列？
雖然

df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number

被寫成列，他們為該列中的每一行重復報告編號/供應商/報告日期。

當您使用df2 = df.append(df, ignore_index=True)檢查 output。 它可能不會以您想要的方式附加。

在 python pandas 中組合多個 excel 文件時出現問題

問題描述

1 個解決方案

解決方案1
0 2021-04-03 20:46:33

在 python pandas 中組合多個 excel 文件時出現問題

問題描述

1 個解決方案

解決方案1 0 2021-04-03 20:46:33

解決方案1
0 2021-04-03 20:46:33