简体   繁体   English

如何使用xlrd在python中按列名读取Excel数据

[英]How to read Excel data by column name in python using xlrd

I am trying to read the data of large excel file(almost 100000 row).我正在尝试读取大型 excel 文件(近 100000 行)的数据。 I am using 'xlrd Module' in python to fetch the data from excel.我在python中使用'xlrd Module'从excel中获取数据。 I want to fetch data by column name( Cascade,Schedule Name,Market ) instead of column number( 0,1,2 ).我想按列名( Cascade,Schedule Name,Market )而不是列号( 0,1,2 )获取数据。 Because my excel columns are not fixed.因为我的excel列不是固定的。 i know how to fetch data in case of fixed column.我知道如何在固定列的情况下获取数据。

在此处输入图片说明

here is the code by which i am fetching data from the excel for fixed column这是我从固定列的 excel 中获取数据的代码

import xlrd

file_location =r"C:\Users\Desktop\Vision.xlsx"
workbook=xlrd.open_workbook(file_location)
sheet= workbook.sheet_by_index(0)
print(sheet.ncols,sheet.nrows,sheet.name,sheet.number)

for i in range(sheet.nrows):
   flag = 0
   for j in range(sheet.ncols):
      value=sheet.cell(i,j).value

If anyone has any solution of this, kindly let me know如果有人对此有任何解决方案,请告诉我

Thanks谢谢

Alternatively you could also make use of pandas , which is a comprehensive data analysis library with built-in excel I/O capabilities .或者,您也可以使用pandas ,它是一个具有内置excel I/O 功能的综合数据分析库。

import pandas as pd

file_location =r"C:\Users\esatnir\Desktop\Sprint Vision.xlsx"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)

# Reduce dataframe to target columns (by filtering on column names)
df = df[['Cascade', 'Schedule Name', 'Market']]

where a quick view of the resulting dataframe df will show:结果数据帧df的快速视图将显示:

In [1]: df
Out[1]:
   Cascade     Schedule Name                Market
0  SF05UB0  DO Macro Upgrade  Upper Central Valley
1  DE03HO0  DO Macro Upgrade                Toledo
2  SF73XC4  DO Macro Upgrade                SF Bay

Your column names are in the first row of the spreadsheet, right?您的列名在电子表格的第一行,对吗? So read the first row and construct a mapping from names to column indices.因此,读取第一行并构建从名称到列索引的映射。

column_pos = [ (sheet.cell(0, i).value, i) for i in range(sheet.ncols) ]
colidx = dict(column_pos)

Or as a one-liner:或者作为单线:

colidx = dict( (sheet.cell(0, i).value, i) for i in range(sheet.ncols) )

You can then use the index to interpret column names, for example:然后您可以使用索引来解释列名,例如:

print(sheet.cell(5, colidx["Schedule Name"]).value)

To get an entire column, you can use a list comprehension:要获得一整列,您可以使用列表理解:

schedule = [ sheet.cell(i, colidx["Schedule Name"]).value for i in range(1, sheet.nrows) ]

If you really wanted to, you could create a wrapper for the cell function that handles the interpretation.如果您真的愿意,可以为处理解释的cell函数创建一个包装器。 But I think this is simple enough.但我认为这很简单。

Comment : still not working when header of评论:标题时仍然无法正常工作
fieldnames = ['Cascade', 'Market', 'Schedule', 'Name] and fieldnames = ['Cascade', 'Market', 'Schedule', 'Name]
Sheet(['Cascade', 'Schedule', 'Name', 'Market']) are equal. Sheet(['Cascade', 'Schedule', 'Name', 'Market'])是相等的。

Keep order of fieldnames in col_idx , was not my initial goal.col_idx保持fieldnamescol_idx ,这不是我最初的目标。


Question : I want to fetch data by column name问题:我想按列名获取数据

The following OOP solution will work:以下OOP解决方案将起作用:

class OrderedByName():
    """
    Privides a generator method, to iterate in Column Name ordered sequence
    Provides subscription, to get columns index by name. using class[name]
    """
    def __init__(self, sheet, fieldnames, row=0):
        """
        Create a OrderedDict {name:index} from 'fieldnames'
        :param sheet: The Worksheet to use
        :param fieldnames: Ordered List of Column Names
        :param row: Default Row Index for the Header Row
        """
        from collections import OrderedDict
        self.columns = OrderedDict().fromkeys(fieldnames, None)
        for n in range(sheet.ncols):
            self.columns[sheet.cell(row, n).value] = n

    @property
    def ncols(self):
        """
        Generator, equal usage as range(xlrd.ncols), 
          to iterate columns in ordered sequence
        :return: yield Column index
        """
        for idx in self.columns.values():
            yield idx

    def __getitem__(self, item):
        """
        Make class object subscriptable
        :param item: Column Name
        :return: Columns index
        """
        return self.columns[item]

Usage :用法

# Worksheet Data
sheet([['Schedule', 'Cascade', 'Market'],
       ['SF05UB0', 'DO Macro Upgrade', 'Upper Cnetral Valley'],
       ['DE03HO0', 'DO Macro Upgrade', 'Toledo'],
       ['SF73XC4', 'DO Macro Upgrade', 'SF Bay']]
      )

# Instantiate with Ordered List of Column Names
# NOTE the different Order of Column Names
by_name = OrderedByName(sheet, ['Cascade', 'Market', 'Schedule'])

# Iterate all Rows and all Columns Ordered as instantiated
for row in range(sheet.nrows):
    for col in by_name.ncols:
        value = sheet.cell(row, col).value
        print("cell({}).value == {}".format((row,col), value))

Output :输出

 cell((0, 1)).value == Cascade cell((0, 2)).value == Market cell((0, 0)).value == Schedule cell((1, 1)).value == DO Macro Upgrade cell((1, 2)).value == Upper Cnetral Valley cell((1, 0)).value == SF05UB0 cell((2, 1)).value == DO Macro Upgrade cell((2, 2)).value == Toledo cell((2, 0)).value == DE03HO0 cell((3, 1)).value == DO Macro Upgrade cell((3, 2)).value == SF Bay cell((3, 0)).value == SF73XC4

Get Index of one Column by Name按名称获取列的索引

print("cell{}.value == {}".format((1, by_name['Schedule']), sheet.cell(1, by_name['Schedule']).value)) #>>> cell(1, 0).value == SF05UB0

Tested with Python: 3.5用 Python 测试:3.5

You can make use of pandas.您可以使用熊猫。 Below is the sample code for identifying the columns and rows in an excel sheet.下面是用于识别 Excel 工作表中的列和行的示例代码。

import pandas as pd

file_location =r"Your_Excel_Path"

# Read out first sheet of excel file and return as pandas dataframe
df = pd.read_excel(file_location)


total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

# Print total number of rows in an excel sheet
print("Number of Rows: "+str(total_rows))

# Print total number of columns in an excel sheet
print("Number of Columns: "+str(total_cols))

# Print column names in an excel sheet
print(df.columns.ravel())

Now once you have the column data, you can convert it into a list of values.现在,一旦您拥有列数据,就可以将其转换为值列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 xlrd 读取 excels 'Student Name' 列数据? - How to read the excels 'Student Name' column data using xlrd? 如何使用xlrd在Python中获取Excel工作表名称 - How to get excel sheet name in Python using xlrd 如何使用python从文件夹中的多个excel文件中读取具有“ mine”工作表名称的工作表? 我正在使用xlrd - how to read any sheet with the sheet name containing 'mine' from multiple excel files in a folder using python? i am using xlrd 如何使用xlrd将Excel文件读入Python? 它可以读取更新的Office格式吗? - How do I read an Excel file into Python using xlrd? Can it read newer Office formats? Python:使用 xlrd 库从 excel 电子表格中读取数据给了我不正确的行数 - Python: Using xlrd library to read data from excel spreadsheet gives me incorrect number of rows Python:使用 xlrd 从 excel 中读取百分比值 - Python: read a percentage value from excel using xlrd Python Excel使用xlrd解析数据 - Python Excel parsing data with xlrd 如何从 excel 行中的列中提取数据? (Python - Selenium/xlrd/Pandas) - How to extract data from a column in excel row for row? (Python - Selenium/xlrd/Pandas) 如何使用xlrd版本1.1.0在Excel中读取字体和背景色 - How to read the Font and Background color in excel using xlrd version 1.1.0 如何使用pythons xlrd模块从Excel工作表中读取 - How to read from an excel sheet using pythons xlrd module
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM