I have following situation:
I have a folder with different xlsx-files and want to safe all the xlsx-files in different dataframes (from df2...to dfx). So for each files one dataframe. For Example: "Hello.xlsx" in df2, "Bye.xlsx" in df3...
After that I want to iterate the function "df1.update(dfx)" over all new dataframes I created.
df1 = original dataframe which I already have.
dfx = x stands for all different dataframes I created using 1.
There are some solutions for 1.
In StackOverflow but they all safe the xlsx-files in one big dataframe. But this is not what I want.
Thank you :)
The Code I am "using" right now":
path = os.getcwd()
files = os.listdir(path)
files
Output:
['.ipynb_checkpoints',
'Konsolidierungs-Tool Invoice.ipynb',
'Test.xlsx',
'Test1.xlsx',
'Test2.xlsx',
'Test3.xlsx']
files_xls = [f for f in files if f[-3:] == 'xlsx']
files_xls
output: [] --> I dont know why it is empty
I'm assuming you have the save data frame bit already and you just want to do the variable name part.
A couple of ways you can work with this:
exec
to use the string version of the names and execute it as python code. For the second, you should read the official docs
Edit: The following should load your xlsx files to a series of dataframes:
import pandas as pd
import os
path = os.getcwd()
files = os.listdir(path)
files_xls = [f for f in files if f[-4:] == 'xlsx']
for index. filename in enumerate(files_xls):
exec(f"df{index}" = pd.read_excel({filename}, sheet_name=None)" )
You will then be able to see the dataframes with the variable names df0
, df1
, etc.
You can try this to read all excel files in a directory include sub folders:
import pandas as pd
import xlrd
import os
# Your current directory (including python script & all excel files)
mydir = (os.getcwd()).replace('\\','/') + '/'
#Get all excel files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
print(filelist)
# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1
df=[]
for i in range(number_of_files):
try:
df.append(pd.read_excel(r''+filelist[i]))
except:
print('Empty Ecxcel File!')
print(df)
Output (in my example, i have 4 excel files which 3 excel files store phone number & 1 file is empty):
['D:/SOF/Book1.xlsx', 'D:/SOF/Book2.xlsx', 'D:/SOF/a\\New Text Document.xlsx', 'D:/SOF/subdir1\\Book3.xlsx']
Empty Ecxcel File!
[ Name Phone
0 alfa 82330403045
1 fafa 82330403046
2 albert 82330403047
3 john 82330403048,
Name PhoneCell
0 alfa 82330403049
1 fafa 82330403050
2 albert 82330403051
3 john 82330403052,
Name PhoneCell
0 alfa 82330403049
1 fafa 82330403050
2 albert 82330403051
3 john 82330403052]
Hope this can help you :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.