简体   繁体   English

如何将熊猫read_excel()用于多张Excel文件?

[英]How to use pandas read_excel() for excel file with multi sheets?

I have one excel file with many sheets. 我有一个包含很多工作表的excel文件。 There is only one column in every sheet, which is column A. I plan to read the excel file with read_excel() method. 每张工作表中只有一列,即A列。我计划使用read_excel()方法读取excel文件。 Hier is the code: 上面的代码是:

import pandas as PD

ExcelFile  = "C:\\AAA.xlsx"
SheetNames = ['0', '1', 'S', 'B', 'U'] 
# There are five sheets in this excel file. Those are the sheet names.

PageTotal  = len(SheetNames)

for Page in range(PageTotal):
    df = PD.read_excel(ExcelFile, header=None, squeeze = True, parse_cols = "A" ,sheetname=str(SheetNames[Page]))
    print df
    #do something with df

The problem is, the for loop runs only once. 问题是, for loop仅运行一次。 By running the second item in the for loop it shows me the following error text: 通过在for loop运行第二项,它向我显示以下错误文本:

  File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 227, in __init__
    self.book = xlrd.open_workbook(io)
  File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 824, in open_workbook_2007_xml
    x12sst.process_stream(zflo, 'SST')
  File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 432, in process_stream_iterparse
    for event, elem in ET.iterparse(stream):
  File "<string>", line 103, in next
IndexError: pop from empty stack

As a beginner I have no idea about this error. 作为一个初学者,我不知道这个错误。 Could anybody please help me to correct the codes? 有人可以帮我更正密码吗? Thanks. 谢谢。

UPDATE Question: 更新问题:

If it is because that the excel file contains many formulars and external links, why the for loop could still run its first item? 如果是因为excel文件包含许多公式编写器和外部链接,那么为什么for loop仍可以运行其第一项? Confused. 困惑。

Why are you using sheetname=str(SheetNames[Page]) ? 为什么要使用sheetname=str(SheetNames[Page])

If I understand your question properly I think what you want is: 如果我正确理解了您的问题,我想您想要的是:

import pandas as PD

excel_file  = r"C:\\AAA.xlsx"
sheet_names = ['0', '1', 'S', 'B', 'U'] 

for sheet_name in sheet_names:
    df = pd.read_excel(excel_file, header=None, squeeze=True, parse_cols="A", sheetname=sheet_name)
    print(df)
    #do something with df 

Referring to the answer here: Using Pandas to pd.read_excel() for multiple worksheets of the same workbook 在这里引用答案: 对同一工作簿的多个工作表使用Pandas进行pd.read_excel()

Perhaps you can try this: 也许您可以尝试以下方法:

import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = []
for x in ['0', '1', 'S', 'B', 'U'] :
    dfs.append(xls.parse(x))

Or this as a dict instead of list so you can easily get a particular sheet out to work with 或将此作为命令而不是列表,以便您可以轻松获取特定的工作表

import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = {}
for x in ['0', '1', 'S', 'B', 'U'] :
    dfs[x] = xls.parse(x)

You can simply use: 您可以简单地使用:

df = pd.read_excel("C:\\AAA.xlsx", sheet_name=None)  
for key, value in df.items(): 
    ................

When you set 'sheet_name=None', pandas will automatically read all excel sheets from your workbook. 当您设置“ sheet_name = None”时,熊猫会自动从您的工作簿中读取所有Excel工作表。 And for iterating over sheets and it's content you can iterate over 'df.items()' and do whatever manipulation you'll have to do. 为了遍历工作表及其内容,您可以遍历'df.items()'并进行任何必须要做的操作。 In this above code 'key' is the sheets name and 'value' is the content inside sheet. 在上面的代码中,“键”是工作表名称,“值”是工作表内的内容。 There is no need to create extra list object, in your case 'sheet_names'. 在您的情况下,无需创建额外的列表对象“ sheet_names”。 Hope it will solve your issue. 希望它能解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM