[英]Data Extraction from multiple excel files in pandas dataframe
I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame.我正在尝试创建一个数据摄取例程,以从多个 excel 文件中加载数据,该文件在 pandas 数据框中具有多个选项卡和列。 The structuring of the tabs in each of the excel files is the same.
每个 excel 文件中的选项卡结构都是相同的。 Each tab of the excel file should be a separate data frame.
excel 文件的每个选项卡应该是一个单独的数据框。 As of now, I have created a list of data frames for each excel file that holds all the data from all the tabs as it is concatenated.
到目前为止,我已经为每个 excel 文件创建了一个数据帧列表,该文件包含所有选项卡中的所有数据,因为它是连接的。 But, I'm trying to find a way to access each excel from a data structure and each tab of that excel file as a separate data frame.
但是,我试图找到一种方法来从数据结构中访问每个 excel,并将该 excel 文件的每个选项卡作为单独的数据帧。 Below mentioned is the current code.
下面提到的是当前代码。 Any improvisation would be appreciated.!
任何即兴创作将不胜感激。! Please let me know if anything else is needed.
如果还需要什么,请告诉我。
#Assigning the path to the folder variable
folder = 'specified_path'
#Getting the list of files from the assigned path
excel_files = [file for file in os.listdir(folder)]
list_of_dfs = []
for file in excel_files :
df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
df['excelfile_name'] = file.split('.')[0]
list_of_dfs.append(df)
I would propose to change the line我建议换行
df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
to至
df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None))
df.index = df.index.get_level_values(0)
df.reset_index().rename({'index':'Tab'}, axis=1)
To create a separate dataframe for each tab (with duplicated content) in an Excel file, one could iterate over index level 0 values and index with it:要为 Excel 文件中的每个选项卡(具有重复内容)创建单独的 dataframe,可以迭代索引级别 0 值并使用它进行索引:
df = pd.concat(pd.read_excel(filename, sheet_name=None))
list_of_dfs = []
for tab in df.index.get_level_values(0).unique():
tab_df = df.loc[tab]
list_of_dfs.append(tab_df)
For illustration, here is the dataframe content after reading an Excel file with 3 tabs:为了说明,这里是在读取具有 3 个选项卡的 Excel 文件后的 dataframe 内容:
After running the above code, here is the content of list_of_dfs
:运行上述代码后,
list_of_dfs
的内容如下:
[ Date Reviewed Adjusted
0 2022-07-11 43 20
1 2022-07-18 16 8
2 2022-07-25 8 3
3 2022-08-01 17 3
4 2022-08-15 14 6
5 2022-08-22 12 5
6 2022-08-29 8 4,
Date Reviewed Adjusted
0 2022-07-11 43 20
1 2022-07-18 16 8
2 2022-07-25 8 3
3 2022-08-01 17 3
4 2022-08-15 14 6
5 2022-08-22 12 5
6 2022-08-29 8 4,
Date Reviewed Adjusted
0 2022-07-11 43 20
1 2022-07-18 16 8
2 2022-07-25 8 3
3 2022-08-01 17 3
4 2022-08-15 14 6
5 2022-08-22 12 5
6 2022-08-29 8 4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.