I am quite new to python programming. I need to combine 1000+ files into one file. each file has 3 sheets in it and I need to get data only from sheet2 and make an final excel file. I am facing a problem to pick a value from specific cell from each excel file on sheet2 and create a column. python is picking the value from first file and create a column on that
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsm'):
df = pd.read_excel(file, sheet_name=1, header=None)
df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number
df2 = df2.dropna(thresh=15)
df2 = df.append(df, ignore_index=True)
df = df.reset_index()
del df['index']
df2.to_excel('FINAL_FILES.xlsx')
How can I solve this issue so python can take from each excel and put the information on right rows.
I df.iloc[2][4]
refers to the 2nd row and 4th column of the 1st sheet. You have imported with sheet_name=1
and never activated a different sheet, though you mentioned all of the .xlsm
have 3 sheets.
II your scoping could be wrong. Why define df
outside of the loop? If will change per file, so no need for an external one. All info form the loop should be put into your df2
before the next iteration of the loop.
III Have you checked if append
is adding a row or a column?
Even though
df['REPORT_NO'] = df.iloc[1][4] #Report Number
df['SUPPLIER'] = df.iloc[2][4] #Supplier
df['REPORT_DATE'] = df.iloc[0][4] #Report Number
are written as columns they have Report Number/Supplier/Report Date repeated for every row in that column.
When you use df2 = df.append(df, ignore_index=True)
check the output. It might not be appending in the way you intend.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.