[英]Problem with combining multiple excel files in python pandas
I am quite new to python programming.我对 python 编程很陌生。 I need to combine 1000+ files into one file.
我需要将 1000 多个文件合并到一个文件中。 each file has 3 sheets in it and I need to get data only from sheet2 and make an final excel file.
每个文件有 3 张纸,我只需要从 sheet2 获取数据并制作最终的 excel 文件。 I am facing a problem to pick a value from specific cell from each excel file on sheet2 and create a column.
我面临一个问题,即从 sheet2 上的每个 excel 文件中的特定单元格中选择一个值并创建一个列。 python is picking the value from first file and create a column on that
python 正在从第一个文件中选择值并在该文件上创建一个列
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsm'):
df = pd.read_excel(file, sheet_name=1, header=None)
df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number
df2 = df2.dropna(thresh=15)
df2 = df.append(df, ignore_index=True)
df = df.reset_index()
del df['index']
df2.to_excel('FINAL_FILES.xlsx')
How can I solve this issue so python can take from each excel and put the information on right rows.我该如何解决这个问题,以便 python 可以从每个 excel 中获取信息并将信息放在正确的行上。
I df.iloc[2][4]
refers to the 2nd row and 4th column of the 1st sheet.我
df.iloc[2][4]
指的是第一张纸的第二行和第四列。 You have imported with sheet_name=1
and never activated a different sheet, though you mentioned all of the .xlsm
have 3 sheets.您已经使用
sheet_name=1
导入并且从未激活过其他工作表,尽管您提到所有.xlsm
都有 3 张工作表。
II your scoping could be wrong. II你的范围可能是错误的。 Why define
df
outside of the loop?为什么要在循环之外定义
df
? If will change per file, so no need for an external one.如果每个文件都会改变,所以不需要外部文件。 All info form the loop should be put into your
df2
before the next iteration of the loop.循环中的所有信息都应在循环的下一次迭代之前放入您的
df2
中。
III Have you checked if append
is adding a row or a column? III 你检查
append
是加行还是加列?
Even though虽然
df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number
are written as columns they have Report Number/Supplier/Report Date repeated for every row in that column.被写成列,他们为该列中的每一行重复报告编号/供应商/报告日期。
When you use df2 = df.append(df, ignore_index=True)
check the output.当您使用
df2 = df.append(df, ignore_index=True)
检查 output。 It might not be appending in the way you intend.它可能不会以您想要的方式附加。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.