简体   繁体   中英

Read all xlsx-Files in a folder and save the files in different DataFrames

I have following situation:

  1. I have a folder with different xlsx-files and want to safe all the xlsx-files in different dataframes (from df2...to dfx). So for each files one dataframe. For Example: "Hello.xlsx" in df2, "Bye.xlsx" in df3...

  2. After that I want to iterate the function "df1.update(dfx)" over all new dataframes I created.

df1 = original dataframe which I already have.

dfx = x stands for all different dataframes I created using 1.

There are some solutions for 1.

In StackOverflow but they all safe the xlsx-files in one big dataframe. But this is not what I want.

Thank you :)

The Code I am "using" right now":

path = os.getcwd()
files = os.listdir(path)
files

Output: 
['.ipynb_checkpoints',
 'Konsolidierungs-Tool Invoice.ipynb',
 'Test.xlsx',
 'Test1.xlsx',
 'Test2.xlsx',
 'Test3.xlsx']

files_xls = [f for f in files if f[-3:] == 'xlsx']
files_xls

output: [] --> I dont know why it is empty

I'm assuming you have the save data frame bit already and you just want to do the variable name part.

A couple of ways you can work with this:

  1. Use a dictionary with keys as the dfx names and the values being the data frames
  2. Use exec to use the string version of the names and execute it as python code.

For the second, you should read the official docs

Edit: The following should load your xlsx files to a series of dataframes:

import pandas as pd
import os

path = os.getcwd()
files = os.listdir(path)

files_xls = [f for f in files if f[-4:] == 'xlsx']

for index. filename in enumerate(files_xls):
    exec(f"df{index}" = pd.read_excel({filename}, sheet_name=None)" )

You will then be able to see the dataframes with the variable names df0 , df1 , etc.

You can try this to read all excel files in a directory include sub folders:

import pandas as pd
import xlrd
import os

# Your current directory (including python script & all excel files)
mydir = (os.getcwd()).replace('\\','/') + '/'

#Get all excel files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
    for file in files:
        if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
            filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
print(filelist)

# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1
df=[]
for i in range(number_of_files):
    try:
        df.append(pd.read_excel(r''+filelist[i]))
    except:
        print('Empty Ecxcel File!')
print(df)

Output (in my example, i have 4 excel files which 3 excel files store phone number & 1 file is empty):

['D:/SOF/Book1.xlsx', 'D:/SOF/Book2.xlsx', 'D:/SOF/a\\New Text Document.xlsx', 'D:/SOF/subdir1\\Book3.xlsx']
Empty Ecxcel File!

[     Name        Phone
0    alfa  82330403045
1    fafa  82330403046
2  albert  82330403047
3    john  82330403048,      

Name    PhoneCell
0    alfa  82330403049
1    fafa  82330403050
2  albert  82330403051
3    john  82330403052,      

Name    PhoneCell
0    alfa  82330403049
1    fafa  82330403050
2  albert  82330403051
3    john  82330403052]

Hope this can help you :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM