pandas read excel sheet with multiple sheets and different header offsets

Question

I have to read an Excel sheet in pandas which contains multiple sheets. Unfortunately, the number of white space rows before the header starts seems to be different:

pd.read_excel('foo.xlsx', header=[2,3], sheet_name='first')
pd.read_excel('foo.xlsx', header=[1,2], sheet_name='second')

Is there an elegant way to fix this and read the Excel into a pandas.Dataframe with an additional column which contains the name of each sheet?

Ie how can

pd.read_excel(file_name, sheet_name=None)

be passed a varying header argument or choose at least the 2 first (non empty) rows as header?

edit

dynamically skip top blank rows of excel in python pandas seems to be related but not the solution as only the first headers are accepted.

edit2

Description of exact file structure:

... (varying number of empty rows)
__irrelevant_row__
HEADER_1
HEADER_2

where currently it is either 1 or 0 empty rows. But as pointed out in the comment it would be great if that would be more dynamic.

Answer 1

I am certain this could be done in a more neat fashion, but a way to achieve (I think) what you want is:

import openpyxl
import pandas as pd
book = openpyxl.load_workbook(PATH_TO_FILE)
for sh in book.sheetnames:
    a = pd.DataFrame(book[sh].values).dropna(how='all').reset_index(drop=True)
    a.columns = a.iloc[1]
    a = a.iloc[2:]
    a.iloc[0].index.name=sh
    a["sheet"] = a.iloc[0].index.name
    try:
        b = b.append(a)
    except NameError:
        b = a.copy()
b.iloc[0].index.name = ''
print(b)
#  header1 header2   sheet
#2       1       2   first
#3       3       4   first
#2       1       2  second
#3       3       4  second
#2       1       2     3rd
#3       3       4     3rd

Unfortunately I have no clue how it interacts with your actual data, but I do hope this helps you in your quest.

pandas read excel sheet with multiple sheets and different header offsets

Question

edit

edit2

1 answers

solution1
0 ACCPTED 2018-11-08 10:24:29

pandas read excel sheet with multiple sheets and different header offsets

Question

edit

edit2

1 answers

solution1 0 ACCPTED 2018-11-08 10:24:29

solution1
0 ACCPTED 2018-11-08 10:24:29