简体   繁体   English

从 pandas dataframe 中的多个 excel 文件中提取数据

[英]Data Extraction from multiple excel files in pandas dataframe

I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame.我正在尝试创建一个数据摄取例程,以从多个 excel 文件中加载数据,该文件在 pandas 数据框中具有多个选项卡和列。 The structuring of the tabs in each of the excel files is the same.每个 excel 文件中的选项卡结构都是相同的。 Each tab of the excel file should be a separate data frame. excel 文件的每个选项卡应该是一个单独的数据框。 As of now, I have created a list of data frames for each excel file that holds all the data from all the tabs as it is concatenated.到目前为止,我已经为每个 excel 文件创建了一个数据帧列表,该文件包含所有选项卡中的所有数据,因为它是连接的。 But, I'm trying to find a way to access each excel from a data structure and each tab of that excel file as a separate data frame.但是,我试图找到一种方法来从数据结构中访问每个 excel,并将该 excel 文件的每个选项卡作为单独的数据帧。 Below mentioned is the current code.下面提到的是当前代码。 Any improvisation would be appreciated.!任何即兴创作将不胜感激。! Please let me know if anything else is needed.如果还需要什么,请告诉我。

#Assigning the path to the folder variable
folder = 'specified_path'

#Getting the list of files from the assigned path
excel_files = [file for file in os.listdir(folder)]

list_of_dfs = []
for file in excel_files :
    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
    df['excelfile_name'] = file.split('.')[0]
    list_of_dfs.append(df)

I would propose to change the line我建议换行

    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)

to

    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None))
    df.index = df.index.get_level_values(0)
    df.reset_index().rename({'index':'Tab'}, axis=1)

To create a separate dataframe for each tab (with duplicated content) in an Excel file, one could iterate over index level 0 values and index with it:要为 Excel 文件中的每个选项卡(具有重复内容)创建单独的 dataframe,可以迭代索引级别 0 值并使用它进行索引:

df = pd.concat(pd.read_excel(filename, sheet_name=None))
list_of_dfs = []
for tab in df.index.get_level_values(0).unique():
    tab_df = df.loc[tab]
    list_of_dfs.append(tab_df)

For illustration, here is the dataframe content after reading an Excel file with 3 tabs:为了说明,这里是在读取具有 3 个选项卡的 Excel 文件后的 dataframe 内容: 完整的数据框

After running the above code, here is the content of list_of_dfs :运行上述代码后, list_of_dfs的内容如下:

[        Date  Reviewed  Adjusted
 0 2022-07-11        43        20
 1 2022-07-18        16         8
 2 2022-07-25         8         3
 3 2022-08-01        17         3
 4 2022-08-15        14         6
 5 2022-08-22        12         5
 6 2022-08-29         8         4,
         Date  Reviewed  Adjusted
 0 2022-07-11        43        20
 1 2022-07-18        16         8
 2 2022-07-25         8         3
 3 2022-08-01        17         3
 4 2022-08-15        14         6
 5 2022-08-22        12         5
 6 2022-08-29         8         4,
         Date  Reviewed  Adjusted
 0 2022-07-11        43        20
 1 2022-07-18        16         8
 2 2022-07-25         8         3
 3 2022-08-01        17         3
 4 2022-08-15        14         6
 5 2022-08-22        12         5
 6 2022-08-29         8         4]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Pandas Dataframe中条件提取数据 - Conditional Extraction of Data from Pandas Dataframe 比较季度数据:在 Python(Pandas) 中迭代以比较来自四个不同 excel 文件的多列,这些文件导入为 dataframe - Comparing quarterly data: Iteration in Python(Pandas) to compare multiple columns from four different excel files imported as dataframe 从多个html文件中提取(相同类型的数据)并将结果存储在单个数据框中 - Extraction from multiple html files (same kind of data) and storing the results in a single dataframe 从oracle中提取数据到python pandas数据帧非常慢 - extraction of data from oracle into python pandas dataframe very slow 使用 pandas 从多个 excel 文件创建多个数据帧 - To create multiple data frames from multiple excel files using pandas 从多个Excel文件创建熊猫数据框 - Creating Pandas Data Frame from Multiple Excel Files 使用pandas包在python中合并来自多个Excel文件的数据 - Combining data from multiple excel files in python using pandas package 使用 Pandas 从多个 Excel 文件中读取和合并数据 - Read & Combine Data From Multiple Excel Files with Pandas 从 excel 文件中通过 Pandas 数据框生成查询 - generating query by pandas dataframe from excel files Pandas - 将多个 excel 文件读入单个 Pandas Dataframe - Pandas - Reading multiple excel files into a single pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM