How to use skiprows in pandas.read_excel() dynamically?

Question

I want to read a lot of differents Excel files with pandas read_excel() function. Sometimes, the Excel file start at A1 in Excel, other time at B1, other time at C3 etc. How can I use the skiprows argument in pandas.read_excel() to deal with this?

Answer 1

As stated in the comments, you can not set skiprows dynamically. Instead, you could define and use a helper function, like this:

import pandas as pd

def skip_blank_rows_and_columns(df):
    df = df.dropna(how="all", axis=0).dropna(how="all", axis=1)
    df.columns = df.iloc[0].to_list()
    return df[1:].reset_index(drop=True)

And so, with the following file:

df = pd.read_excel("example.xlsx")
print(df)
# Output
   Unnamed: 0  Unnamed: 1  Unnamed: 2  ... Unnamed: 4 Unnamed: 5 Unnamed: 6
0         NaN         NaN         NaN  ...        NaN        NaN        NaN
1         NaN         NaN         NaN  ...        NaN        NaN        NaN
2         NaN         NaN         NaN  ...        NaN        NaN        NaN
3         NaN         NaN         NaN  ...        NaN        NaN        NaN
4         NaN         NaN         NaN  ...        NaN        NaN        NaN
5         NaN         NaN         NaN  ...          b          c          d
6         NaN         NaN         NaN  ...          8          6          1
7         NaN         NaN         NaN  ...          8          3          2
8         NaN         NaN         NaN  ...          7          9          0

You can do:

df = skip_blank_rows_and_columns(pd.read_excel("example.xlsx"))
print(df)
# Output
   a  b  c  d
0  1  8  6  1
1  4  8  3  2
2  5  7  9  0

How to use skiprows in pandas.read_excel() dynamically?

Question

1 answers

solution1
0 2022-01-15 17:38:54

How to use skiprows in pandas.read_excel() dynamically?

Question

1 answers

solution1 0 2022-01-15 17:38:54

solution1
0 2022-01-15 17:38:54