简体   繁体   English

防止熊猫read_Excel / read_CSV自动分配(即推断)索引

[英]Prevent Pandas read_Excel / read_CSV from assigning (i.e. inferring) an index automatically

Total newbie and this is my first ever question so apologies in advance for any inadvertent faux pas. 完全是新手,这是我有史以来第一个提出的问题,因此对于任何无意的虚假行为,我们事先致歉。

I have a large(ish) dataset in Excel xlsx format that I would like to import into a pandas dataframe. 我有一个Excel(xlsx)格式的大型(ish)数据集,我想导入到pandas数据框中。 The data has column headers except for the first column which does not have a header label. 数据具有列标题,但第一列没有标题标签。 Here is what the excel sheet looks like: 这是excel工作表的样子:

Raw data 原始数据

I am using read_excel() in Pandas to read in the data. 我在Pandas中使用read_excel()读取数据。 The code I am using is: df = pd.read_excel('Raw_Data.xlsx', sheetname=0, labels=None, header=0, index_col=None) 我正在使用的代码是: df = pd.read_excel('Raw_Data.xlsx', sheetname=0, labels=None, header=0, index_col=None)

(I have tried index_col = false or 0 but, for obvious reasons, it doesn't change anything) (我尝试过index_col = false或0,但是,由于明显的原因,它什么都没有改变)

The headers for the columns are picked up fine but the first column, circled in red in the image below, is assigned as the index. 可以很好地选择各列的标题,但在下图中用红色圈出的第一列被指定为索引。

wrong index 错误的索引

What I am trying to get from the read_excel command is as follows with the index circled in red: 我试图从read_excel命令获得的内容如下,索引用红色圈出:

correct index 正确的索引

I have other excel sheets that I have used read_excel() to import into pandas and pandas automatically adds in a numerical incremental index rather than inferring one of the columns as an index. 我还有其他的Excel工作表,我已使用read_excel()导入到pandas中,并且pandas自动添加了一个数字增量索引,而不是将其中一列推断为索引。

None of those excel sheets had missing label in the column header though which might be the issue here though I am not sure. 这些excel工作表都没有在列标题中缺少标签,尽管我不确定这可能是这里的问题。

I understand that I can use the reset_index() command after the import to get the correct index. 我了解可以在导入后使用reset_index()命令来获取正确的索引。

Wondering if it can be done without having to do the reset_index() and within the read_excel() command. 想知道是否可以不必执行reset_index()和在read_excel()命令中完成此操作。 ie is there anyway to prevent an index being inferred or to force pandas to add in the index column like it normally does. 即无论如何都可以防止索引被推断或像往常一样强迫熊猫在索引列中添加。

Thank you in advance! 先感谢您!

I don't think you can do it with only the read_excel function because of the missing value in cell A1. 由于单元格A1中缺少值,我认为您不能仅使用read_excel函数来完成此操作。 If you want to insert something into that cell prior to reading the file with pandas, you could consider using openpyxl as below. 如果要在使用pandas读取文件之前向该单元格中插入一些内容,可以考虑如下使用openpyxl。

from openpyxl import load_workbook as load

path = 'Raw_Data.xlsx'
col_name = 'not_index'
cell = 'A1'

def write_to_cell(path, col_name, cell):

    wb = load(path)

    for sheet in wb.sheetnames:
        ws = wb[sheet]
        if ws[cell].value is None:
            ws[cell] = col_name

    wb.save(path)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM