简体   繁体   English

提取多个 excel 文件作为 Pandas 数据帧

[英]Extracting multiple excel files as Pandas data frame

I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame.我正在尝试创建一个数据摄取例程,以从多个 excel 文件中加载数据,该文件在 pandas 数据框中具有多个选项卡和列。 The structuring of the tabs in each of the excel files is the same.每个 excel 文件中的选项卡结构都是相同的。 Any help would be appreciated!!任何帮助,将不胜感激!!

folder = "specified_path"
files = os.listdir(folder)
sheet_contents = {}

for file in files:
    data = pd.ExcelFile(folder+file)
    file_data = {}

    for sheet in data.sheet_names:
        file_data[sheet] = data.parse(sheet)

    sheet_contents[file[:-5]] = file_data

One of the ways to create a dataframe for each excelfile (stored in a specific folder and that holds multiple sheets) is by using pandas.read_excel andpandas.concat combined.为每个 excelfile(存储在特定文件夹中并包含多张工作表)创建 dataframe 的方法之一是使用pandas.read_excelpandas.concat组合。 By passing the parameter sheet_name=None to pandas.read_excel , we can read in all the sheets in the excelfile at one time.通过将参数sheet_name=None传递给pandas.read_excel ,我们可以一次读取 excelfile 中的所有工作表。

Try this:尝试这个:

import os
import pandas as pd

folder = 'specified_path'

excel_files = [file for file in os.listdir(folder)]

list_of_dfs = []
for file in excel_files :
    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
    df['excelfile_name'] = file.split('.')[0]
    list_of_dfs.append(df)

To access to one of the dataframes created, you can use its index (eg, list_of_dfs[0] ):要访问创建的数据框之一,您可以使用其索引(例如list_of_dfs[0] ):

print(type(list_of_dfs[0]))
<class 'pandas.core.frame.DataFrame'>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM