简体   繁体   中英

Python: How to copy Excel worksheet from multiple Excel files to one Excel file that contains all the worksheets from other Excel files

It's my first time to use pandas, I have multiple excel files, that i want to combine all into one Excel file using python pandas.

I managed to merge the content of the first sheets in each excel file into one sheet in a new excel file like this shown in the figure below: combined sheets in one sheet

I wrote this code to implement this:

import glob
import pandas as pd
path = "C:/folder"
file_identifier = "*.xls"
all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
   df = pd.read_excel(f)
   all_data = all_data.append(df,ignore_index=True)

writer = pd.ExcelWriter('combined.xls', engine='xlsxwriter')    
all_data.to_excel(writer, sheet_name='Summary Sheet') 
writer.save()
file_df = pd.read_excel("C:/folder/combined.xls")
# Keep only FIRST record from set of duplicates
file_df_first_record = file_df.drop_duplicates(subset=["Test summary", "Unnamed: 1", "Unnamed: 2", 
"Unnamed: 3"], keep="first")
file_df_first_record.to_excel("filtered.xls", index=False, sheet_name='Summary Sheet')

But I have two issues:

  1. How to remove cells that has "Unnamed" as shown in the previous figure
  2. How to copy other worksheets (the second worksheet in each Excel file, not the first worksheet) from all other Excel files and put it in one Excel file with multiple worksheets and with different students names like shown in the picture.

all worksheets in one excel file

So i managed to combine worksheet1 from all Excel files in one sheet, but now I want to copy A, B, C, D, E worksheets into one Excel file that has all other remaining worksheets in other Excel files.

Each Excel file of the ones I have looks like this single excel file

If you want to have all data gathered together in one worksheet you can use the following script:

  1. Put all excel workbooks (ie excel files) to be processed into a folder (see variable paths ).

  2. Get the paths of all workbooks in that folder using glob.glob .

  3. Return all worksheets of each workbook with read_excel(path, sheet_name=None) and prepare them for merging.

  4. Merge all worksheets with concat .

  5. Export the final output to_excel .

     import pandas as pd import glob paths = glob.glob(r"C:\excelfiles\*.xlsx") path_save = r"finished.xlsx" df_lst = [pd.read_excel(path, sheet_name=None).values() for path in paths] df_lst = [y.transpose().reset_index().transpose() for x in df_lst for y in x] df_result = pd.concat(df_lst, ignore_index=True) df_result.to_excel(path_save, index=False, header=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM