在 Python 中动态读取 Excel 文件

Question

I am trying to read an excel which has some blank rows as well as columns.我正在尝试阅读一个包含一些空白行和列的 excel。 The process becomes more complicated as it has some junk values before the header as well.这个过程变得更加复杂，因为它在标题之前也有一些垃圾值。

Currently, I am hardcoding a column name to extract the table.目前，我正在对列名进行硬编码以提取表。 This has two drawbacks what if the column is not present in the table and what if the column name repeats in the column value.这有两个缺点，如果列不存在于表中，以及列名在列值中重复会怎样。 Is there a way to dynamically write a program that automatically detects the table header and reads the table?有没有办法动态写一个程序，自动检测表头并读取表？

snippet of the code:代码片段：

raw_data = pd.read_excel('test_data1.xlsx','Sheet8',header=None)

data_duplicate = pd.DataFrame()

for row in range(raw_data.shape[0]): 
    for col in range(raw_data.shape[1]):
        if raw_data.iloc[row,col] == 'Currency':
            data_duplicate = raw_data.iloc[(row+1):].reset_index(drop=True)
            data_duplicate.columns = list(raw_data.iloc[row])
            break
data_duplicate.dropna(axis=1, how='all',inplace=True)
data_duplicate

Also, the number of bank rows + garbage rows before the header is not fixed.此外，标题前的银行行数+垃圾行数不固定。

Answer 1

Here's my way: You can drop all rows and all columns containing Nan这是我的方式：您可以删除包含 Nan 的所有行和所有列

data = pd.read_excel('test.xlsx')
data = data.dropna(how='all', axis = 1)
data = data.dropna(how='all', axis = 0)
data = data.reset_index(drop = True)

better if you put it into a function if you need to open multiple DataFrame in the same code:如果您需要在相同的代码中打开多个 DataFrame，那么将其放入一个函数中会更好：

data = pd.read_excel('test.xlsx')

def remove_nans(df):
    x = df.dropna(how = 'all', axis = 1)
    x = x.dropna(how = 'all', axis = 0)
    x = x.reset_index(drop = True)
    return x

df = remove_nans(data)
print(df)

在 Python 中动态读取 Excel 文件

问题描述

1 个解决方案

解决方案1
0 2020-02-19 04:50:10

在 Python 中动态读取 Excel 文件

问题描述

1 个解决方案

解决方案1 0 2020-02-19 04:50:10

解决方案1
0 2020-02-19 04:50:10