![](/img/trans.png)
[英]How to read the excels 'Student Name' column data using xlrd?
[英]How to read Excel data by column name in python using xlrd
我正在尝试读取大型 excel 文件(近 100000 行)的数据。 我在python中使用'xlrd Module'从excel中获取数据。 我想按列名( Cascade,Schedule Name,Market )而不是列号( 0,1,2 )获取数据。 因为我的excel列不是固定的。 我知道如何在固定列的情况下获取数据。
这是我从固定列的 excel 中获取数据的代码
import xlrd
file_location =r"C:\Users\Desktop\Vision.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet= workbook.sheet_by_index(0)
print(sheet.ncols,sheet.nrows,sheet.name,sheet.number)
for i in range(sheet.nrows):
flag = 0
for j in range(sheet.ncols):
value=sheet.cell(i,j).value
如果有人对此有任何解决方案,请告诉我
谢谢
或者,您也可以使用pandas
,它是一个具有内置excel I/O 功能的综合数据分析库。
import pandas as pd
file_location =r"C:\Users\esatnir\Desktop\Sprint Vision.xlsx"
# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)
# Reduce dataframe to target columns (by filtering on column names)
df = df[['Cascade', 'Schedule Name', 'Market']]
结果数据帧df
的快速视图将显示:
In [1]: df
Out[1]:
Cascade Schedule Name Market
0 SF05UB0 DO Macro Upgrade Upper Central Valley
1 DE03HO0 DO Macro Upgrade Toledo
2 SF73XC4 DO Macro Upgrade SF Bay
您的列名在电子表格的第一行,对吗? 因此,读取第一行并构建从名称到列索引的映射。
column_pos = [ (sheet.cell(0, i).value, i) for i in range(sheet.ncols) ]
colidx = dict(column_pos)
或者作为单线:
colidx = dict( (sheet.cell(0, i).value, i) for i in range(sheet.ncols) )
然后您可以使用索引来解释列名,例如:
print(sheet.cell(5, colidx["Schedule Name"]).value)
要获得一整列,您可以使用列表理解:
schedule = [ sheet.cell(i, colidx["Schedule Name"]).value for i in range(1, sheet.nrows) ]
如果您真的愿意,可以为处理解释的cell
函数创建一个包装器。 但我认为这很简单。
评论:标题时仍然无法正常工作
fieldnames = ['Cascade', 'Market', 'Schedule', 'Name]
和
Sheet(['Cascade', 'Schedule', 'Name', 'Market'])
是相等的。
在col_idx
保持fieldnames
的col_idx
,这不是我最初的目标。
问题:我想按列名获取数据
以下OOP
解决方案将起作用:
class OrderedByName():
"""
Privides a generator method, to iterate in Column Name ordered sequence
Provides subscription, to get columns index by name. using class[name]
"""
def __init__(self, sheet, fieldnames, row=0):
"""
Create a OrderedDict {name:index} from 'fieldnames'
:param sheet: The Worksheet to use
:param fieldnames: Ordered List of Column Names
:param row: Default Row Index for the Header Row
"""
from collections import OrderedDict
self.columns = OrderedDict().fromkeys(fieldnames, None)
for n in range(sheet.ncols):
self.columns[sheet.cell(row, n).value] = n
@property
def ncols(self):
"""
Generator, equal usage as range(xlrd.ncols),
to iterate columns in ordered sequence
:return: yield Column index
"""
for idx in self.columns.values():
yield idx
def __getitem__(self, item):
"""
Make class object subscriptable
:param item: Column Name
:return: Columns index
"""
return self.columns[item]
用法:
# Worksheet Data
sheet([['Schedule', 'Cascade', 'Market'],
['SF05UB0', 'DO Macro Upgrade', 'Upper Cnetral Valley'],
['DE03HO0', 'DO Macro Upgrade', 'Toledo'],
['SF73XC4', 'DO Macro Upgrade', 'SF Bay']]
)
# Instantiate with Ordered List of Column Names
# NOTE the different Order of Column Names
by_name = OrderedByName(sheet, ['Cascade', 'Market', 'Schedule'])
# Iterate all Rows and all Columns Ordered as instantiated
for row in range(sheet.nrows):
for col in by_name.ncols:
value = sheet.cell(row, col).value
print("cell({}).value == {}".format((row,col), value))
输出:
cell((0, 1)).value == Cascade cell((0, 2)).value == Market cell((0, 0)).value == Schedule cell((1, 1)).value == DO Macro Upgrade cell((1, 2)).value == Upper Cnetral Valley cell((1, 0)).value == SF05UB0 cell((2, 1)).value == DO Macro Upgrade cell((2, 2)).value == Toledo cell((2, 0)).value == DE03HO0 cell((3, 1)).value == DO Macro Upgrade cell((3, 2)).value == SF Bay cell((3, 0)).value == SF73XC4
按名称获取一列的索引
print("cell{}.value == {}".format((1, by_name['Schedule']), sheet.cell(1, by_name['Schedule']).value)) #>>> cell(1, 0).value == SF05UB0
用 Python 测试:3.5
您可以使用熊猫。 下面是用于识别 Excel 工作表中的列和行的示例代码。
import pandas as pd
file_location =r"Your_Excel_Path"
# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)
total_rows=len(df.axes[0])
total_cols=len(df.axes[1])
# Print total number of rows in an excel sheet
print("Number of Rows: "+str(total_rows))
# Print total number of columns in an excel sheet
print("Number of Columns: "+str(total_cols))
# Print column names in an excel sheet
print(df.columns.ravel())
现在,一旦您拥有列数据,就可以将其转换为值列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.