简体   繁体   中英

Extracting multiple excel files as Pandas data frame

I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame. The structuring of the tabs in each of the excel files is the same. Any help would be appreciated!!

folder = "specified_path"
files = os.listdir(folder)
sheet_contents = {}

for file in files:
    data = pd.ExcelFile(folder+file)
    file_data = {}

    for sheet in data.sheet_names:
        file_data[sheet] = data.parse(sheet)

    sheet_contents[file[:-5]] = file_data

One of the ways to create a dataframe for each excelfile (stored in a specific folder and that holds multiple sheets) is by using pandas.read_excel andpandas.concat combined. By passing the parameter sheet_name=None to pandas.read_excel , we can read in all the sheets in the excelfile at one time.

Try this:

import os
import pandas as pd

folder = 'specified_path'

excel_files = [file for file in os.listdir(folder)]

list_of_dfs = []
for file in excel_files :
    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
    df['excelfile_name'] = file.split('.')[0]
    list_of_dfs.append(df)

To access to one of the dataframes created, you can use its index (eg, list_of_dfs[0] ):

print(type(list_of_dfs[0]))
<class 'pandas.core.frame.DataFrame'>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM