[英]pandas,read_excel, usecols with list input generating an empty dataframe
Actually i want to read only a specific column from excel into python dataframe my code is实际上我只想将excel中的特定列读取到python数据帧中,我的代码是
import pandas as pd
file = pd.read_excel("3_Plants sorted on PLF age cost.xlsx",sheet_name="Age>25",index_col="Developer",usecols="Name of Project")
but i am getting an empty dataframe as output, however when i use但我得到一个空数据帧作为输出,但是当我使用
import pandas as pd
file = pd.read_excel("3_Plants sorted on PLF age cost.xlsx",sheet_name="Age>25",index_col="Developer",usecols=2)
I get the desired result,我得到了想要的结果,
As i have to do it for many files using a loop and location of the columns keeps on changing so i have to go by its name and not location.由于我必须使用循环对许多文件执行此操作,并且列的位置不断变化,因此我必须按其名称而不是位置。
Further i cant load full file in dataframe and use df["column_name"]
as size of my excel file is too large (150 MB) and this will make my process very slow and sometime gives memory error.此外,我无法在数据框中加载完整文件并使用
df["column_name"]
作为我的 excel 文件的大小太大(150 MB),这将使我的过程非常缓慢,有时会出现内存错误。
Thanks in advance.提前致谢。
As mentioned by Tomas Farias, usecols doesn't take cell values.正如 Tomas Farias 所提到的,usecols 不接受单元格值。 A possible approach is to read few rows and find the location of the column and then read the file second time.
一种可能的方法是读取几行并找到列的位置,然后第二次读取文件。
import pandas as pd
col = pd.read_excel("3_Plants sorted on PLF age cost.xlsx",sheet_name="Age>25", nrows=2).columns
k=col.get_loc('Name of Project')+1
file = pd.read_excel("3_Plants sorted on PLF age cost.xlsx", sheet_name="Age>25", index_col="Developer", usecols=k)
您可以将 .xlsx 文件保存/转换为 .csv,然后使用: pd.read_csv('filename.csv', usecols=[])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.