简体   繁体   English

熊猫数据框read_excel不会将空白的左上单元格视为列?

[英]Pandas dataframe read_excel does not consider blank upper left cells as columns?

I'm trying to read an Excel or CSV file into pandas dataframe. 我正在尝试将Excel或CSV文件读入pandas数据框。 The file will read the first two columns only, and the top row of the first two columns will be the column names. 该文件将仅读取前两列,并且前两列的第一行将是列名。 The problem is when I have the first column of the top row empty in the Excel file. 问题是当我在Excel文件中第一行的第一列为空时。

            IDs
2/26/2010    2
3/31/2010    4
4/31/2010    2
5/31/2010    2

Then, the last line of the following code fails: 然后,以下代码的最后一行失败:

uploaded_file = request.FILES['file-name']
if uploaded_file.name.endswith('.csv'):
    df = pd.read_csv(uploaded_file, usecols=[0,1])
else:
    df = pd.read_excel(uploaded_file, usecols=[0,1])

ref_date = 'ref_date'
regime_tag = 'regime_tag'
df.columns = [ref_date, regime_tag]

Apparently, it only reads one column (ie the IDs). 显然,它只读取一列(即ID)。 However, with read_csv , it reads both column, with the first column being unnamed . 但是,使用read_csv ,它会读取两列,而第一列是unnamed I want it to behave that way and read both columns regardless of whether the top cells are empty or filled. 我希望它具有这种行为,并且无论顶部单元格是空还是填充,都读取两列。 How do I go about doing that? 我该怎么做?

What's happening is the first "column" in the Excel file is being read in as an index, while in the CSV file it's being treated as a column / series. 发生的情况是,Excel文件中的第一个“列”被作为索引读取,而在CSV文件中,其被视为列/系列。

I recommend you work the other way and amend pd.read_csv to read the first column as an index. 我建议您采用其他方法,并修改pd.read_csv以将第一列作为索引读取。 Then use reset_index to elevate the index to a series: 然后使用reset_index将索引提升为一系列:

if uploaded_file.name.endswith('.csv'):
    df = pd.read_csv(uploaded_file, usecols=[0,1], index_col=0)
else:
    df = pd.read_excel(uploaded_file, header=[0,1], usecols=[0,1])

df = df.reset_index()  # this will elevate index to a column called 'index'

This will give consistent output, ie first series will have label 'index' and the index of the dataframe will be the regular pd.RangeIndex . 这将提供一致的输出,即第一个序列将具有标签'index' ,而数据帧的索引将为常规pd.RangeIndex

You could potentially use a dispatcher to get rid of the unwieldy if / else construct: if / else构造,您可能会使用调度程序来摆脱笨拙的if

file_flag = {True: pd.read_csv, False: pd.read_excel}
read_func = file_flag[uploaded_file.name.endswith('.csv')]

df = read_func(uploaded_file, usecols=[0,1], index_col=0).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM