繁体   English   中英

如何使用xlrd在python中按列名读取Excel数据

[英]How to read Excel data by column name in python using xlrd

我正在尝试读取大型 excel 文件(近 100000 行)的数据。 我在python中使用'xlrd Module'从excel中获取数据。 我想按列名( Cascade,Schedule Name,Market )而不是列号( 0,1,2 )获取数据。 因为我的excel列不是固定的。 我知道如何在固定列的情况下获取数据。

在此处输入图片说明

这是我从固定列的 excel 中获取数据的代码

import xlrd

file_location =r"C:\Users\Desktop\Vision.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet= workbook.sheet_by_index(0)
print(sheet.ncols,sheet.nrows,sheet.name,sheet.number)

for i in range(sheet.nrows):
   flag = 0
   for j in range(sheet.ncols):
      value=sheet.cell(i,j).value

如果有人对此有任何解决方案,请告诉我

谢谢

或者,您也可以使用pandas ,它是一个具有内置excel I/O 功能的综合数据分析库。

import pandas as pd

file_location =r"C:\Users\esatnir\Desktop\Sprint Vision.xlsx"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)

# Reduce dataframe to target columns (by filtering on column names)
df = df[['Cascade', 'Schedule Name', 'Market']]

结果数据帧df的快速视图将显示:

In [1]: df
Out[1]:
   Cascade     Schedule Name                Market
0  SF05UB0  DO Macro Upgrade  Upper Central Valley
1  DE03HO0  DO Macro Upgrade                Toledo
2  SF73XC4  DO Macro Upgrade                SF Bay

您的列名在电子表格的第一行,对吗? 因此,读取第一行并构建从名称到列索引的映射。

column_pos = [ (sheet.cell(0, i).value, i) for i in range(sheet.ncols) ]
colidx = dict(column_pos)

或者作为单线:

colidx = dict( (sheet.cell(0, i).value, i) for i in range(sheet.ncols) )

然后您可以使用索引来解释列名,例如:

print(sheet.cell(5, colidx["Schedule Name"]).value)

要获得一整列,您可以使用列表理解:

schedule = [ sheet.cell(i, colidx["Schedule Name"]).value for i in range(1, sheet.nrows) ]

如果您真的愿意,可以为处理解释的cell函数创建一个包装器。 但我认为这很简单。

评论:标题时仍然无法正常工作
fieldnames = ['Cascade', 'Market', 'Schedule', 'Name]
Sheet(['Cascade', 'Schedule', 'Name', 'Market'])是相等的。

col_idx保持fieldnamescol_idx ,这不是我最初的目标。


问题:我想按列名获取数据

以下OOP解决方案将起作用:

class OrderedByName():
    """
    Privides a generator method, to iterate in Column Name ordered sequence
    Provides subscription, to get columns index by name. using class[name]
    """
    def __init__(self, sheet, fieldnames, row=0):
        """
        Create a OrderedDict {name:index} from 'fieldnames'
        :param sheet: The Worksheet to use
        :param fieldnames: Ordered List of Column Names
        :param row: Default Row Index for the Header Row
        """
        from collections import OrderedDict
        self.columns = OrderedDict().fromkeys(fieldnames, None)
        for n in range(sheet.ncols):
            self.columns[sheet.cell(row, n).value] = n

    @property
    def ncols(self):
        """
        Generator, equal usage as range(xlrd.ncols), 
          to iterate columns in ordered sequence
        :return: yield Column index
        """
        for idx in self.columns.values():
            yield idx

    def __getitem__(self, item):
        """
        Make class object subscriptable
        :param item: Column Name
        :return: Columns index
        """
        return self.columns[item]

用法

# Worksheet Data
sheet([['Schedule', 'Cascade', 'Market'],
       ['SF05UB0', 'DO Macro Upgrade', 'Upper Cnetral Valley'],
       ['DE03HO0', 'DO Macro Upgrade', 'Toledo'],
       ['SF73XC4', 'DO Macro Upgrade', 'SF Bay']]
      )

# Instantiate with Ordered List of Column Names
# NOTE the different Order of Column Names
by_name = OrderedByName(sheet, ['Cascade', 'Market', 'Schedule'])

# Iterate all Rows and all Columns Ordered as instantiated
for row in range(sheet.nrows):
    for col in by_name.ncols:
        value = sheet.cell(row, col).value
        print("cell({}).value == {}".format((row,col), value))

输出

 cell((0, 1)).value == Cascade cell((0, 2)).value == Market cell((0, 0)).value == Schedule cell((1, 1)).value == DO Macro Upgrade cell((1, 2)).value == Upper Cnetral Valley cell((1, 0)).value == SF05UB0 cell((2, 1)).value == DO Macro Upgrade cell((2, 2)).value == Toledo cell((2, 0)).value == DE03HO0 cell((3, 1)).value == DO Macro Upgrade cell((3, 2)).value == SF Bay cell((3, 0)).value == SF73XC4

按名称获取列的索引

print("cell{}.value == {}".format((1, by_name['Schedule']), sheet.cell(1, by_name['Schedule']).value)) #>>> cell(1, 0).value == SF05UB0

用 Python 测试:3.5

您可以使用熊猫。 下面是用于识别 Excel 工作表中的列和行的示例代码。

import pandas as pd

file_location =r"Your_Excel_Path"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)


total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

# Print total number of rows in an excel sheet
print("Number of Rows: "+str(total_rows))

# Print total number of columns in an excel sheet
print("Number of Columns: "+str(total_cols))

# Print column names in an excel sheet
print(df.columns.ravel())

现在,一旦您拥有列数据,就可以将其转换为值列表。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM