简体   繁体   中英

How to use pandas read_excel() for excel file with multi sheets?

I have one excel file with many sheets. There is only one column in every sheet, which is column A. I plan to read the excel file with read_excel() method. Hier is the code:

import pandas as PD

ExcelFile  = "C:\\AAA.xlsx"
SheetNames = ['0', '1', 'S', 'B', 'U'] 
# There are five sheets in this excel file. Those are the sheet names.

PageTotal  = len(SheetNames)

for Page in range(PageTotal):
    df = PD.read_excel(ExcelFile, header=None, squeeze = True, parse_cols = "A" ,sheetname=str(SheetNames[Page]))
    print df
    #do something with df

The problem is, the for loop runs only once. By running the second item in the for loop it shows me the following error text:

  File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 170, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 227, in __init__
    self.book = xlrd.open_workbook(io)
  File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 824, in open_workbook_2007_xml
    x12sst.process_stream(zflo, 'SST')
  File "C:\Python27\lib\site-packages\xlrd\xlsx.py", line 432, in process_stream_iterparse
    for event, elem in ET.iterparse(stream):
  File "<string>", line 103, in next
IndexError: pop from empty stack

As a beginner I have no idea about this error. Could anybody please help me to correct the codes? Thanks.

UPDATE Question:

If it is because that the excel file contains many formulars and external links, why the for loop could still run its first item? Confused.

Why are you using sheetname=str(SheetNames[Page]) ?

If I understand your question properly I think what you want is:

import pandas as PD

excel_file  = r"C:\\AAA.xlsx"
sheet_names = ['0', '1', 'S', 'B', 'U'] 

for sheet_name in sheet_names:
    df = pd.read_excel(excel_file, header=None, squeeze=True, parse_cols="A", sheetname=sheet_name)
    print(df)
    #do something with df 

Referring to the answer here: Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Perhaps you can try this:

import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = []
for x in ['0', '1', 'S', 'B', 'U'] :
    dfs.append(xls.parse(x))

Or this as a dict instead of list so you can easily get a particular sheet out to work with

import pandas as pd
xls = pd.ExcelFile("C:\\AAA.xlsx")
dfs = {}
for x in ['0', '1', 'S', 'B', 'U'] :
    dfs[x] = xls.parse(x)

You can simply use:

df = pd.read_excel("C:\\AAA.xlsx", sheet_name=None)  
for key, value in df.items(): 
    ................

When you set 'sheet_name=None', pandas will automatically read all excel sheets from your workbook. And for iterating over sheets and it's content you can iterate over 'df.items()' and do whatever manipulation you'll have to do. In this above code 'key' is the sheets name and 'value' is the content inside sheet. There is no need to create extra list object, in your case 'sheet_names'. Hope it will solve your issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM