简体   繁体   中英

Iterate through all sheets of all workbooks in a directory

I am trying to combine all spreadsheets from all workbooks in a directory into a single df. I've tried with glob and with os.scandir but either way I keep only getting the first sheet of all workbooks. First attempt:

import pandas as pd
import glob

workbooks = glob.glob(r"\mydirectory\*.xlsx")
list = []
for file in workbooks:
    df = pd.concat(pd.read_excel(file, sheet_name=None), ignore_index = True)
    list.append(df)
dataframe = pd.concat(list, axis = 0)

Second attempt:

import os
import pandas as pd
df = pd.DataFrame()
path = r"\mydirectory\"
with os.scandir(path) as files:
    for file in files:
        data = pd.read_excel(file, sheet_name = None)
        df = df.append(data) 

I think the issue lies with the for loop but I'm too inexperienced to pin down the problem. Any help would be greatly appreciated, thx!!!

If I understand what you have written correctly, you want something like this:

import pandas as pd
import glob

# list of workbooks in directory
workbooks = glob.glob(r"\mydirectory\*.xlsx")
l = []

# for each file in list
for file in workbooks:
    # Class for file allows for retrieving sheet names
    xl_file = pd.ExcelFile(file)    
    # concatenate DataFrames created from each sheet in the file
    df = pd.concat([pd.read_excel(file, sheet) for sheet in xl_file.sheet_names], ignore_index=True)
    # append to list
    l.append(df)
# concatenate all file DataFrames to one DataFrame.
dataframe = pd.concat(l, axis=0)

This loops through the sheets within the Excel file for the concatenation, the only difference to what you had already written.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM