I want to read a lot of differents Excel files with pandas read_excel()
function. Sometimes, the Excel file start at A1 in Excel, other time at B1, other time at C3 etc. How can I use the skiprows argument in pandas.read_excel()
to deal with this?
As stated in the comments, you can not set skiprows dynamically. Instead, you could define and use a helper function, like this:
import pandas as pd
def skip_blank_rows_and_columns(df):
df = df.dropna(how="all", axis=0).dropna(how="all", axis=1)
df.columns = df.iloc[0].to_list()
return df[1:].reset_index(drop=True)
And so, with the following file:
df = pd.read_excel("example.xlsx")
print(df)
# Output
Unnamed: 0 Unnamed: 1 Unnamed: 2 ... Unnamed: 4 Unnamed: 5 Unnamed: 6
0 NaN NaN NaN ... NaN NaN NaN
1 NaN NaN NaN ... NaN NaN NaN
2 NaN NaN NaN ... NaN NaN NaN
3 NaN NaN NaN ... NaN NaN NaN
4 NaN NaN NaN ... NaN NaN NaN
5 NaN NaN NaN ... b c d
6 NaN NaN NaN ... 8 6 1
7 NaN NaN NaN ... 8 3 2
8 NaN NaN NaN ... 7 9 0
You can do:
df = skip_blank_rows_and_columns(pd.read_excel("example.xlsx"))
print(df)
# Output
a b c d
0 1 8 6 1
1 4 8 3 2
2 5 7 9 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.