Reading Excel file dynamically in Python

Question

I am trying to read an excel which has some blank rows as well as columns. The process becomes more complicated as it has some junk values before the header as well.

Currently, I am hardcoding a column name to extract the table. This has two drawbacks what if the column is not present in the table and what if the column name repeats in the column value. Is there a way to dynamically write a program that automatically detects the table header and reads the table?

snippet of the code:

raw_data = pd.read_excel('test_data1.xlsx','Sheet8',header=None)

data_duplicate = pd.DataFrame()

for row in range(raw_data.shape[0]): 
    for col in range(raw_data.shape[1]):
        if raw_data.iloc[row,col] == 'Currency':
            data_duplicate = raw_data.iloc[(row+1):].reset_index(drop=True)
            data_duplicate.columns = list(raw_data.iloc[row])
            break
data_duplicate.dropna(axis=1, how='all',inplace=True)
data_duplicate

Also, the number of bank rows + garbage rows before the header is not fixed.

Answer 1

Here's my way: You can drop all rows and all columns containing Nan

data = pd.read_excel('test.xlsx')
data = data.dropna(how='all', axis = 1)
data = data.dropna(how='all', axis = 0)
data = data.reset_index(drop = True)

better if you put it into a function if you need to open multiple DataFrame in the same code:

data = pd.read_excel('test.xlsx')

def remove_nans(df):
    x = df.dropna(how = 'all', axis = 1)
    x = x.dropna(how = 'all', axis = 0)
    x = x.reset_index(drop = True)
    return x

df = remove_nans(data)
print(df)

Reading Excel file dynamically in Python

Question

1 answers

solution1
0 2020-02-19 04:50:10

Reading Excel file dynamically in Python

Question

1 answers

solution1 0 2020-02-19 04:50:10

solution1
0 2020-02-19 04:50:10