[英]Read all xlsx-Files in a folder and save the files in different DataFrames
I have following situation: 我有以下情况:
I have a folder with different xlsx-files and want to safe all the xlsx-files in different dataframes (from df2...to dfx). 我有一个包含不同xlsx文件的文件夹,并希望保护不同数据帧中的所有xlsx文件(从df2 ...到dfx)。 So for each files one dataframe.
所以对于每个文件一个数据帧。 For Example: "Hello.xlsx" in df2, "Bye.xlsx" in df3...
例如:df2中的“Hello.xlsx”,df3中的“Bye.xlsx”...
After that I want to iterate the function "df1.update(dfx)" over all new dataframes I created. 之后我想在我创建的所有新数据帧上迭代函数“df1.update(dfx)”。
df1 = original dataframe which I already have. df1 =我已经拥有的原始数据帧。
dfx = x stands for all different dataframes I created using 1. dfx = x代表我使用1创建的所有不同数据帧。
There are some solutions for 1. 有一些解决方案1。
In StackOverflow but they all safe the xlsx-files in one big dataframe. 在StackOverflow中,它们都可以在一个大数据帧中保护xlsx文件。 But this is not what I want.
但这不是我想要的。
Thank you :) 谢谢 :)
The Code I am "using" right now": 我正在“使用”的代码“:
path = os.getcwd()
files = os.listdir(path)
files
Output:
['.ipynb_checkpoints',
'Konsolidierungs-Tool Invoice.ipynb',
'Test.xlsx',
'Test1.xlsx',
'Test2.xlsx',
'Test3.xlsx']
files_xls = [f for f in files if f[-3:] == 'xlsx']
files_xls
output: [] --> I dont know why it is empty
I'm assuming you have the save data frame bit already and you just want to do the variable name part. 我假设你已经有了保存数据帧位,你只想做变量名称部分。
A couple of ways you can work with this: 您可以使用以下几种方法:
exec
to use the string version of the names and execute it as python code. exec
来使用名称的字符串版本并将其作为python代码执行。 For the second, you should read the official docs 对于第二个,你应该阅读官方文档
Edit: The following should load your xlsx files to a series of dataframes: 编辑:以下内容应将xlsx文件加载到一系列数据帧:
import pandas as pd
import os
path = os.getcwd()
files = os.listdir(path)
files_xls = [f for f in files if f[-4:] == 'xlsx']
for index. filename in enumerate(files_xls):
exec(f"df{index}" = pd.read_excel({filename}, sheet_name=None)" )
You will then be able to see the dataframes with the variable names df0
, df1
, etc. 然后,您将能够看到具有变量名称
df0
, df1
等的数据帧。
You can try this to read all excel files in a directory include sub folders: 您可以尝试此操作来读取包含子文件夹的目录中的所有Excel文件:
import pandas as pd
import xlrd
import os
# Your current directory (including python script & all excel files)
mydir = (os.getcwd()).replace('\\','/') + '/'
#Get all excel files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
for file in files:
if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
print(filelist)
# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1
df=[]
for i in range(number_of_files):
try:
df.append(pd.read_excel(r''+filelist[i]))
except:
print('Empty Ecxcel File!')
print(df)
Output (in my example, i have 4 excel files which 3 excel files store phone number & 1 file is empty): 输出(在我的例子中,我有4个excel文件,其中3个excel文件存储电话号码,1个文件为空):
['D:/SOF/Book1.xlsx', 'D:/SOF/Book2.xlsx', 'D:/SOF/a\\New Text Document.xlsx', 'D:/SOF/subdir1\\Book3.xlsx']
Empty Ecxcel File!
[ Name Phone
0 alfa 82330403045
1 fafa 82330403046
2 albert 82330403047
3 john 82330403048,
Name PhoneCell
0 alfa 82330403049
1 fafa 82330403050
2 albert 82330403051
3 john 82330403052,
Name PhoneCell
0 alfa 82330403049
1 fafa 82330403050
2 albert 82330403051
3 john 82330403052]
Hope this can help you :) 希望这可以帮到你 :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.