I'd been having a bit of trouble finding meaningful info on how to open multiple Excel ( .xls) files in Python and combine them into a single dataframe and then save them out again as a single Excel ( .xlsx) file.
I found a way that works for me, and I thought i'd share with all the helpful people on Stack Overflow!!!
The solution I found is as follows:
Be sure to install all appropriate packages in Anaconda for library imports to work
import numpy as np
import pandas as pd
import os
import xlrd
import glob2
*** is the directory where your files are located
os.chdir("/***/Extracted Individual Excel Files/")
"combined_df" is what I named the dataframe, but you can name it whatever, just be sure to reflect the change in your code...
combined_df = pd.DataFrame()
for f in glob2.glob("*.xls"):
df = pd.read_excel(f)
combined_df = combined_df.append(df,ignore_index=True)
At this point you should be able punch in ”combined_df” and press enter to display the dataframe...
Use the following code to save your newly combined dataframe to a file of your choosing.
Again, *** is where you want your file saved... @@@@ is the file name you give the file
combined_df.to_excel(r'/***/@@@@.xlsx', index = False)
I hope this helps everyone. As a beginner, it was a pain to figure out!!!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.