简体   繁体   English

在 python pandas 中组合多个 excel 文件时出现问题

[英]Problem with combining multiple excel files in python pandas

I am quite new to python programming.我对 python 编程很陌生。 I need to combine 1000+ files into one file.我需要将 1000 多个文件合并到一个文件中。 each file has 3 sheets in it and I need to get data only from sheet2 and make an final excel file.每个文件有 3 张纸,我只需要从 sheet2 获取数据并制作最终的 excel 文件。 I am facing a problem to pick a value from specific cell from each excel file on sheet2 and create a column.我面临一个问题,即从 sheet2 上的每个 excel 文件中的特定单元格中选择一个值并创建一个列。 python is picking the value from first file and create a column on that python 正在从第一个文件中选择值并在该文件上创建一个列

    df = pd.DataFrame()
            
    for file in files:
        if file.endswith('.xlsm'):
            df = pd.read_excel(file, sheet_name=1, header=None) 
            df['REPORT_NO'] = df.iloc[1][4] #Report Number
            df['SUPPLIER'] = df.iloc[2][4] #Supplier
            df['REPORT_DATE'] = df.iloc[0][4] #Report Number
        df2 = df2.dropna(thresh=15)
        df2 = df.append(df, ignore_index=True)
        df = df.reset_index()
        del df['index']
    df2.to_excel('FINAL_FILES.xlsx')

How can I solve this issue so python can take from each excel and put the information on right rows.我该如何解决这个问题,以便 python 可以从每个 excel 中获取信息并将信息放在正确的行上。

I df.iloc[2][4] refers to the 2nd row and 4th column of the 1st sheet.df.iloc[2][4]指的是第一张纸的第二行和第四列。 You have imported with sheet_name=1 and never activated a different sheet, though you mentioned all of the .xlsm have 3 sheets.您已经使用sheet_name=1导入并且从未激活过其他工作表,尽管您提到所有.xlsm都有 3 张工作表。

II your scoping could be wrong. II你的范围可能是错误的。 Why define df outside of the loop?为什么要在循环之外定义df If will change per file, so no need for an external one.如果每个文件都会改变,所以不需要外部文件。 All info form the loop should be put into your df2 before the next iteration of the loop.循环中的所有信息都应在循环的下一次迭代之前放入您的df2中。

III Have you checked if append is adding a row or a column? III 你检查append是加行还是加列?
Even though虽然

df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number

are written as columns they have Report Number/Supplier/Report Date repeated for every row in that column.被写成列,他们为该列中的每一行重复报告编号/供应商/报告日期。

When you use df2 = df.append(df, ignore_index=True) check the output.当您使用df2 = df.append(df, ignore_index=True)检查 output。 It might not be appending in the way you intend.它可能不会以您想要的方式附加。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM