简体   繁体   English

pandas,read_excel, usecols with list input 生成一个空的数据框

[英]pandas,read_excel, usecols with list input generating an empty dataframe

Actually i want to read only a specific column from excel into python dataframe my code is实际上我只想将excel中的特定列读取到python数据帧中,我的代码是

import pandas as pd
file = pd.read_excel("3_Plants sorted on PLF age cost.xlsx",sheet_name="Age>25",index_col="Developer",usecols="Name of Project")

but i am getting an empty dataframe as output, however when i use但我得到一个空数据帧作为输出,但是当我使用

import pandas as pd
file = pd.read_excel("3_Plants sorted on PLF age cost.xlsx",sheet_name="Age>25",index_col="Developer",usecols=2)

I get the desired result,我得到了想要的结果,

As i have to do it for many files using a loop and location of the columns keeps on changing so i have to go by its name and not location.由于我必须使用循环对许多文件执行此操作,并且列的位置不断变化,因此我必须按其名称而不是位置。

Further i cant load full file in dataframe and use df["column_name"] as size of my excel file is too large (150 MB) and this will make my process very slow and sometime gives memory error.此外,我无法在数据框中加载完整文件并使用df["column_name"]作为我的 excel 文件的大小太大(150 MB),这将使我的过程非常缓慢,有时会出现内存错误。

Thanks in advance.提前致谢。

As mentioned by Tomas Farias, usecols doesn't take cell values.正如 Tomas Farias 所提到的,usecols 不接受单元格值。 A possible approach is to read few rows and find the location of the column and then read the file second time.一种可能的方法是读取几行并找到列的位置,然后第二次读取文件。

import pandas as pd
col = pd.read_excel("3_Plants sorted on PLF age cost.xlsx",sheet_name="Age>25", nrows=2).columns
k=col.get_loc('Name of Project')+1
file = pd.read_excel("3_Plants sorted on PLF age cost.xlsx", sheet_name="Age>25", index_col="Developer", usecols=k)

您可以将 .xlsx 文件保存/转换为 .csv,然后使用: pd.read_csv('filename.csv', usecols=[])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM