简体   繁体   English

阅读文件夹中的所有xlsx文件,并将文件保存在不同的DataFrame中

[英]Read all xlsx-Files in a folder and save the files in different DataFrames

I have following situation: 我有以下情况:

  1. I have a folder with different xlsx-files and want to safe all the xlsx-files in different dataframes (from df2...to dfx). 我有一个包含不同xlsx文件的文件夹,并希望保护不同数据帧中的所有xlsx文件(从df2 ...到dfx)。 So for each files one dataframe. 所以对于每个文件一个数据帧。 For Example: "Hello.xlsx" in df2, "Bye.xlsx" in df3... 例如:df2中的“Hello.xlsx”,df3中的“Bye.xlsx”...

  2. After that I want to iterate the function "df1.update(dfx)" over all new dataframes I created. 之后我想在我创建的所有新数据帧上迭代函数“df1.update(dfx)”。

df1 = original dataframe which I already have. df1 =我已经拥有的原始数据帧。

dfx = x stands for all different dataframes I created using 1. dfx = x代表我使用1创建的所有不同数据帧。

There are some solutions for 1. 有一些解决方案1。

In StackOverflow but they all safe the xlsx-files in one big dataframe. 在StackOverflow中,它们都可以在一个大数据帧中保护xlsx文件。 But this is not what I want. 但这不是我想要的。

Thank you :) 谢谢 :)

The Code I am "using" right now": 我正在“使用”的代码“:

path = os.getcwd()
files = os.listdir(path)
files

Output: 
['.ipynb_checkpoints',
 'Konsolidierungs-Tool Invoice.ipynb',
 'Test.xlsx',
 'Test1.xlsx',
 'Test2.xlsx',
 'Test3.xlsx']

files_xls = [f for f in files if f[-3:] == 'xlsx']
files_xls

output: [] --> I dont know why it is empty

I'm assuming you have the save data frame bit already and you just want to do the variable name part. 我假设你已经有了保存数据帧位,你只想做变量名称部分。

A couple of ways you can work with this: 您可以使用以下几种方法:

  1. Use a dictionary with keys as the dfx names and the values being the data frames 使用带有键的字典作为dfx名称,值是数据帧
  2. Use exec to use the string version of the names and execute it as python code. 使用exec来使用名称的字符串版本并将其作为python代码执行。

For the second, you should read the official docs 对于第二个,你应该阅读官方文档

Edit: The following should load your xlsx files to a series of dataframes: 编辑:以下内容应将xlsx文件加载到一系列数据帧:

import pandas as pd
import os

path = os.getcwd()
files = os.listdir(path)

files_xls = [f for f in files if f[-4:] == 'xlsx']

for index. filename in enumerate(files_xls):
    exec(f"df{index}" = pd.read_excel({filename}, sheet_name=None)" )

You will then be able to see the dataframes with the variable names df0 , df1 , etc. 然后,您将能够看到具有变量名称df0df1等的数据帧。

You can try this to read all excel files in a directory include sub folders: 您可以尝试此操作来读取包含子文件夹的目录中的所有Excel文件:

import pandas as pd
import xlrd
import os

# Your current directory (including python script & all excel files)
mydir = (os.getcwd()).replace('\\','/') + '/'

#Get all excel files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
    for file in files:
        if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
            filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
print(filelist)

# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1
df=[]
for i in range(number_of_files):
    try:
        df.append(pd.read_excel(r''+filelist[i]))
    except:
        print('Empty Ecxcel File!')
print(df)

Output (in my example, i have 4 excel files which 3 excel files store phone number & 1 file is empty): 输出(在我的例子中,我有4个excel文件,其中3个excel文件存储电话号码,1个文件为空):

['D:/SOF/Book1.xlsx', 'D:/SOF/Book2.xlsx', 'D:/SOF/a\\New Text Document.xlsx', 'D:/SOF/subdir1\\Book3.xlsx']
Empty Ecxcel File!

[     Name        Phone
0    alfa  82330403045
1    fafa  82330403046
2  albert  82330403047
3    john  82330403048,      

Name    PhoneCell
0    alfa  82330403049
1    fafa  82330403050
2  albert  82330403051
3    john  82330403052,      

Name    PhoneCell
0    alfa  82330403049
1    fafa  82330403050
2  albert  82330403051
3    john  82330403052]

Hope this can help you :) 希望这可以帮到你 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM