It's my first time to use pandas, I have multiple excel files, that i want to combine all into one Excel file using python pandas.
I managed to merge the content of the first sheets in each excel file into one sheet in a new excel file like this shown in the figure below: combined sheets in one sheet
I wrote this code to implement this:
import glob
import pandas as pd
path = "C:/folder"
file_identifier = "*.xls"
all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
writer = pd.ExcelWriter('combined.xls', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Summary Sheet')
writer.save()
file_df = pd.read_excel("C:/folder/combined.xls")
# Keep only FIRST record from set of duplicates
file_df_first_record = file_df.drop_duplicates(subset=["Test summary", "Unnamed: 1", "Unnamed: 2",
"Unnamed: 3"], keep="first")
file_df_first_record.to_excel("filtered.xls", index=False, sheet_name='Summary Sheet')
But I have two issues:
all worksheets in one excel file
So i managed to combine worksheet1 from all Excel files in one sheet, but now I want to copy A, B, C, D, E worksheets into one Excel file that has all other remaining worksheets in other Excel files.
Each Excel file of the ones I have looks like this single excel file
If you want to have all data gathered together in one worksheet you can use the following script:
Put all excel workbooks (ie excel files) to be processed into a folder (see variable paths
).
Get the paths of all workbooks in that folder using glob.glob
.
Return all worksheets of each workbook with read_excel(path, sheet_name=None)
and prepare them for merging.
Merge all worksheets with concat
.
Export the final output to_excel
.
import pandas as pd import glob paths = glob.glob(r"C:\excelfiles\*.xlsx") path_save = r"finished.xlsx" df_lst = [pd.read_excel(path, sheet_name=None).values() for path in paths] df_lst = [y.transpose().reset_index().transpose() for x in df_lst for y in x] df_result = pd.concat(df_lst, ignore_index=True) df_result.to_excel(path_save, index=False, header=False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.