简体   繁体   English

使用 pandas 读取多个 excel 文件中的多个工作表

[英]Read multiple sheets in multiple excel files using pandas

I am trying to make a list using pandas before putting all data sets into 2D convolution layers.在将所有数据集放入二维卷积层之前,我正在尝试使用 pandas 制作一个列表。

And I was able to merge all data in the multiple excel files as a list.而且我能够将多个 excel 文件中的所有数据合并为一个列表。

However, the code only reads one chosen sheet name in the multiple excel files.但是,该代码仅读取多个 excel 文件中的一个选定工作表名称。

For example, I have 7 sheets in each excel file;例如,我在每个 excel 文件中有 7 张; named as 'gpascore1', 'gpascore2', 'gpascore3', 'gpascore4', 'gpascore5', 'gpascore6', 'gpascore7'.命名为“gpascore1”、“gpascore2”、“gpascore3”、“gpascore4”、“gpascore5”、“gpascore6”、“gpascore7”。

And each sheet has 4 rows and 425 columns like每张纸有 4 行 425 列,如

在此处输入图像描述

As shown below, you can see the code.如下图,可以看到代码。

import os
import pandas as pd

path = os.getcwd()
files = os.listdir(path)

files_xls = [f for f in files if f[-3:] == 'xls']

df = pd.DataFrame()

for f in files_xls:
    data = pd.read_excel(f, 'gpascore1') # Read only one chosen sheet available -> 
                                           gpascore1 is a sheet name.
    df = df.append(data)                 # But there are 6 more sheets and I would like 
                                           to read data from all of the sheets

data_y = df['admit'].values
data_x = []

for i, rows in df.iterrows():
    data_x.append([rows['gre'], rows['gpa'], rows['rank']])

df=df.dropna()
df.count()

Then, I got the result as below.然后,我得到如下结果。

在此处输入图像描述

This is because the data from the 'gpascore1' sheet in 3 excel files were merged.这是因为合并了 3 个 excel 文件中“gpascore1”表中的数据。

But, I want to read the data of 6 more sheets in the excel files.但是,我想在 excel 文件中再读取 6 张数据。

Could anyone help me to find out the answer, please?谁能帮我找出答案,好吗?

Thank you谢谢

===============<Updated code & errors>================================== ===============<更新的代码和错误>============================= =====

Thank you for the answers and I revised the read_excel() as感谢您的回答,我将 read_excel() 修改为

 data = pd.read_excel(f, 'gpascore1') to
 data = pd.read_excel(f, sheet_name=None)

But, I have key errors like below.但是,我有如下关键错误。

在此处输入图像描述

Could you give me any suggestions for this issue, please?你能给我一些关于这个问题的建议吗?

Thank you谢谢

I actually found this question under the tag of 'tensorflow'.我实际上在“tensorflow”标签下找到了这个问题。 That's hilarious.那真好笑。 Ok, so you want to merge all Excel sheets into one dataframe?好的,所以您想将所有 Excel 表合并到一张 dataframe 中吗?

import os
import pandas as pd

import glob
glob.glob("C:\\your_path\\*.xlsx")

all_data = pd.DataFrame()
for f in glob.glob("C:\\your_path\\*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
    
type(all_data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM