简体   繁体   中英

Dynamically import EXCEL sheets and assign them to DataFrames in Python using pandas

I have an EXCEL file with multiple sheets (far more than the three used in three used in this example). I would like to dynamically import them sheet-by-sheet and assign suffixes to all of them to distinguish between them since they are the same variables acquired at different times. I am able to do it using the following code:

import pandas as pd   

filename = 'test.xlsx'
xls   = pd.ExcelFile(filename)

df_1  = pd.read_excel(xls, '#1')
df_1  = df_1.add_suffix('_1')                                           
df_2  = pd.read_excel(xls, '#2')
df_2  = df_2.add_suffix('_2')                                          
df_3  = pd.read_excel(xls, '#3')
df_3  = df_3.add_suffix('_3')     

However, this becomes a bit tedious when I have a large number of variables assigned to different sheets. Thus, I would like to see if there is a way to dynamically do this with a for loop, whereby I would also update the DataFrame name for each iteration.

  • Is there a way to do this?
  • Is it recommended to assign variables dynamically?
import pandas as pd   

filename = 'test.xlsx'
xls   = pd.ExcelFile(filename)
c = 0
dfs = []
for i in xls.sheet_names: #xls.sheet_names contains list of all sheet names in excel.
    df = pd.read_excel(xls, i)
    df = df.add_suffix('_' + str(c))
    dfs.append(df) 
    c += 1

#dfs[0], dfs[1], ... contains all the dataframes of respective sheets 

I tried some more pythonic approaches to this scenario you described using list comprehension and dict comprehension (you can choose which one you want to use).

df_dict = { 'df_' + str(c) : pd.read_excel(xls, i) for c, i in enumerate(xls.sheet_names, 1)}
df_list = [pd.read_excel(xls, i) for i in xls.sheet_names]

print(df_dict['df_1'])
print(df_list[0])

As you can see through tests, both will produce the same DataFrame.

In the first, you will access your data through a numeric index ( df_list[0] , df_list[1] and so on).

In the second, you will access through keys using the names you suggested, with the first key being df_dict['df_1'] , for example.

Another approach would be to dynamically create variables, assigning them to your global dict. For example, the code below will produce the same result as the ones showed above:

for c, i in enumerate(xls.sheet_names, 1):
    globals()['df_' + str(c)] = pd.read_excel(xls, i) 

print(df_1)

However, I don't recommend using this unless it's REALLY mandatory, since you can easily loose track of the variables created in your program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM