简体   繁体   English

Excel使用win32com和python

[英]Excel using win32com and python

我想知道如何使用win32com客户端为python迭代excel表而不读取整个列。

You can read an entire column without iterating from a sheet using the Range collection. 您可以使用Range集合读取整个列,而无需从工作表中进行迭代。 You should never use Cells if performacne is any concern . 如果performacne是任何问题,永远不应该使用Cells Python uses the win32com module to interact with the Excel COM library. Python使用win32com模块与Excel COM库进行交互。 Whenever you use Python and COM (Excel, PowerPoint, Acess, ADODB, etc.) one of your biggest performance constraints will be IO between COM and Python. 每当您使用Python和COM(Excel,PowerPoint,Acess,ADODB等)时,您最大的性能限制之一将是COM和Python之间的IO。 With the Range method you only make one COM method call while with Cells you make one for each row. 使用Range方法只能进行一次COM方法调用,而使用Cells则为每行进行一次。 This would also be faster if you were doing the same in VBA or .NET 如果你在VBA或.NET中做同样的事情,这也会更快

In the following test I created a worksheet with 10 random characters in cells A1 through A2000. 在下面的测试中,我在单元格A1到A2000中创建了一个包含10个随机字符的工作表。 I then extracted these values into lists using both Range and Cells. 然后,我使用Range和Cells将这些值提取到列表中。

import win32com.client
app = win32com.client.Dispatch("Excel.Application")
s = app.ActiveWorkbook.Sheets(1)

def GetValuesByCells():
    startTime = time.time()
    vals = [s.Cells(r,1).Value for r in range(1,2001)]
    return time.time() - startTime

def GetValuesByRange():
    startTime = time.time()
    vals = [v[0] for v in s.Range('A1:A2000').Value]
    return time.time() - startTime

>>> GetValuesByRange()
0.03600001335144043

>>> GetValuesByCells()
5.27400016784668

In this case Range is 2 orders of magnitude faster (146x) faster than Cells. 在这种情况下,Range比Cell快2个数量级(146x)。 Note that the Range method returns a 2D list where each inner list is a row. 请注意,Range方法返回一个2D列表,其中每个内部列表都是一行。 The list iteration transposes vals into a 2D list where the inner list is a column. 列表迭代将vals转换为2D列表,其中内部列表是列。

Have you looked into the openpyxl library? 你有没有看过openpyxl库? From the documentation: 从文档:

from openpyxl import load_workbook
wb = load_workbook(filename='file.xlsx')
ws = wb.get_sheet_by_name(name='Sheet1')
columns = ws.columns()

There's also support for iterators and other goodies. 还有对迭代器和其他好东西的支持。

The fastest way would be to use the built in Range functionality through the win32com.client API. 最快的方法是通过win32com.client API使用内置的Range功能。 However, I'm not a big fan of it. 但是,我不是它的忠实粉丝。 I think the API is confusing and badly documented, and using it isn't very pythonic (but that's just me). 我认为API令人困惑且记录错误,使用它并不是非常pythonic(但这只是我)。

If efficiency is not an issue for you, you can use the excellent xlrd library. 如果效率不是问题,您可以使用优秀的xlrd库。 Like so: 像这样:

import xlrd
book = xlrd.open_workbooks('Book1')
sheet = book.sheet_by_name('Sheet1')
sheel.col(1)
sheet.col(2)
# and so on...

That gives you the cell objects. 这为您提供了单元格对象。 To get pure values, use sheet.col_values (and there are a few other methods that are real nice to work with). 要获得纯值,请使用sheet.col_values (还有一些其他方法非常适合使用)。

Just remember that xlrd stand for "excel read", so if you want to write to an excel file you need a different library called "xlwt" (which is also pretty good, though less so than xlrd in my opinion). 只记得xlrd代表“excel read”,所以如果你想写一个excel文件,你需要一个名为“xlwt”的不同库(这也很不错,但在我看来不如xlrd)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM