I am trying to combine all spreadsheets from all workbooks in a directory into a single df. I've tried with glob
and with os.scandir
but either way I keep only getting the first sheet of all workbooks. First attempt:
import pandas as pd
import glob
workbooks = glob.glob(r"\mydirectory\*.xlsx")
list = []
for file in workbooks:
df = pd.concat(pd.read_excel(file, sheet_name=None), ignore_index = True)
list.append(df)
dataframe = pd.concat(list, axis = 0)
Second attempt:
import os
import pandas as pd
df = pd.DataFrame()
path = r"\mydirectory\"
with os.scandir(path) as files:
for file in files:
data = pd.read_excel(file, sheet_name = None)
df = df.append(data)
I think the issue lies with the for
loop but I'm too inexperienced to pin down the problem. Any help would be greatly appreciated, thx!!!
If I understand what you have written correctly, you want something like this:
import pandas as pd
import glob
# list of workbooks in directory
workbooks = glob.glob(r"\mydirectory\*.xlsx")
l = []
# for each file in list
for file in workbooks:
# Class for file allows for retrieving sheet names
xl_file = pd.ExcelFile(file)
# concatenate DataFrames created from each sheet in the file
df = pd.concat([pd.read_excel(file, sheet) for sheet in xl_file.sheet_names], ignore_index=True)
# append to list
l.append(df)
# concatenate all file DataFrames to one DataFrame.
dataframe = pd.concat(l, axis=0)
This loops through the sheets within the Excel file for the concatenation, the only difference to what you had already written.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.