简体   繁体   中英

pandas read excel sheet with multiple sheets and different header offsets

I have to read an Excel sheet in pandas which contains multiple sheets. Unfortunately, the number of white space rows before the header starts seems to be different:

pd.read_excel('foo.xlsx', header=[2,3], sheet_name='first')
pd.read_excel('foo.xlsx', header=[1,2], sheet_name='second')

Is there an elegant way to fix this and read the Excel into a pandas.Dataframe with an additional column which contains the name of each sheet?

Ie how can

pd.read_excel(file_name, sheet_name=None)

be passed a varying header argument or choose at least the 2 first (non empty) rows as header?

edit

dynamically skip top blank rows of excel in python pandas seems to be related but not the solution as only the first headers are accepted.

edit2

Description of exact file structure:

... (varying number of empty rows)
__irrelevant_row__
HEADER_1
HEADER_2

where currently it is either 1 or 0 empty rows. But as pointed out in the comment it would be great if that would be more dynamic.

I am certain this could be done in a more neat fashion, but a way to achieve (I think) what you want is:

import openpyxl
import pandas as pd
book = openpyxl.load_workbook(PATH_TO_FILE)
for sh in book.sheetnames:
    a = pd.DataFrame(book[sh].values).dropna(how='all').reset_index(drop=True)
    a.columns = a.iloc[1]
    a = a.iloc[2:]
    a.iloc[0].index.name=sh
    a["sheet"] = a.iloc[0].index.name
    try:
        b = b.append(a)
    except NameError:
        b = a.copy()
b.iloc[0].index.name = ''
print(b)
#  header1 header2   sheet
#2       1       2   first
#3       3       4   first
#2       1       2  second
#3       3       4  second
#2       1       2     3rd
#3       3       4     3rd

Unfortunately I have no clue how it interacts with your actual data, but I do hope this helps you in your quest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM