為什么win32com比xlrd慢得多？

Question

我有相同的代碼，使用win32com和xlrd編寫。 xlrd在不到一秒的時間內完成算法，而win32com需要幾分鍾。

這是win32com：

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
for cnum in xrange(9, find_last_col(ws)):
    if ws.Cells(7, cnum).Value:
        wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, find_last_col(ws))
        for cend in xrange(cnum + 1, find_last_col(ws)): #finds end column
            if ws.Cells(7, cend).Value:
                wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, cend - 1)
                break
return wsHeaders

和xlrd

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
for cnum in xrange(8, ws.ncols):
    if ws.cell_value(6, cnum):
        wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)
        for cend in xrange(cnum + 1, ws.ncols):#finds end column
            if ws.cell_value(6, cend):
                wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, cend - 1)
                break
return wsHeaders

Answer 1

（0）你問“為什么win32com比xlrd慢得多？” ......這個問題有點像“你有沒有停止毆打你的妻子？” ---它基於一個可能不正確的預設; win32com是由一位優秀的程序員用C語言編寫的，但xlrd是由普通程序員用純Python編寫的。 真正的區別在於win32com必須調用涉及進程間通信的COM，並且由你知道誰編寫，而xlrd直接讀取Excel文件。 此外，場景中還有第四方：你。 請繼續閱讀。

（1）您沒有向我們展示您在COM代碼中重復使用的find_last_col()函數的來源。 在xlrd代碼中，您很樂意一直使用相同的值（ws.ncols）。 所以在COM代碼中，你應該調用find_last_col(ws) ONCE，然后使用返回的結果。 更新請參閱單獨問題的答案，了解如何從COM獲取xlrd的Sheet.ncols 。

（2）訪問每個單元值TWICE減慢了兩個代碼。 代替

if ws.cell_value(6, cnum):
    wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)

嘗試

value = ws.cell_value(6, cnum)
if value:
    wsHeaders[str(value)] = (cnum, ws.ncols)

注意：每個代碼段中都有2種情況。

（3）嵌套循環的目的並不明顯，但似乎有一些冗余計算，涉及COM的冗余提取。 如果您想通過示例告訴我們您要實現的目標，我們可以幫助您提高運行速度。 至少，從COM中提取值然后在Python中嵌套循環中處理它們應該更快。 有多少列？

更新2同時小精靈用proctoscope接受了你的代碼，並提出了以下腳本：

tests= [
    "A/B/C/D",
    "A//C//",
    "A//C//E",
    "A///D",
    "///D",
    ]
for test in tests:
    print "\nTest:", test
    row = test.split("/")
    ncols = len(row)
    # modelling the OP's code
    # (using xlrd-style 0-relative column indexes)
    d = {}
    for cnum in xrange(ncols):
        if row[cnum]:
            k = row[cnum]
            v = (cnum, ncols) #### BUG; should be ncols - 1 ("inclusive")
            print "outer", cnum, k, '=>', v
            d[k] = v
            for cend in xrange(cnum + 1, ncols):
                if row[cend]:
                    k = row[cnum]
                    v = (cnum, cend - 1)
                    print "inner", cnum, cend, k, '=>', v
                    d[k] = v
                    break
    print d
    # modelling a slightly better algorithm
    d = {}
    prev = None
    for cnum in xrange(ncols):
        key = row[cnum]
        if key:
            d[key] = [cnum, cnum]
            prev = key
        elif prev:
            d[prev][1] = cnum
    print d
    # if tuples are really needed (can't imagine why)
    for k in d:
        d[k] = tuple(d[k])
    print d

輸出這個：

Test: A/B/C/D
outer 0 A => (0, 4)
inner 0 1 A => (0, 0)
outer 1 B => (1, 4)
inner 1 2 B => (1, 1)
outer 2 C => (2, 4)
inner 2 3 C => (2, 2)
outer 3 D => (3, 4)
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 4)}
{'A': [0, 0], 'C': [2, 2], 'B': [1, 1], 'D': [3, 3]}
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 3)}

Test: A//C//
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
{'A': (0, 1), 'C': (2, 5)}
{'A': [0, 1], 'C': [2, 4]}
{'A': (0, 1), 'C': (2, 4)}

Test: A//C//E
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
inner 2 4 C => (2, 3)
outer 4 E => (4, 5)
{'A': (0, 1), 'C': (2, 3), 'E': (4, 5)}
{'A': [0, 1], 'C': [2, 3], 'E': [4, 4]}
{'A': (0, 1), 'C': (2, 3), 'E': (4, 4)}

Test: A///D
outer 0 A => (0, 4)
inner 0 3 A => (0, 2)
outer 3 D => (3, 4)
{'A': (0, 2), 'D': (3, 4)}
{'A': [0, 2], 'D': [3, 3]}
{'A': (0, 2), 'D': (3, 3)}

Test: ///D
outer 3 D => (3, 4)
{'D': (3, 4)}
{'D': [3, 3]}
{'D': (3, 3)}

Answer 2

COM需要與另一個實際處理請求的進程通信。 xlrd在數據結構本身上進行處理。

Answer 3

我昨晚睡覺時想到了它，最后還是用了這個。 比原版更優越的版本：

def makeDict(ws):
"""makes dict with key as header name, 
   value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
last_col = find_last_col(ws)

for cnum in xrange(9, last_col):
    if ws.Cells(7, cnum).Value:
        value = ws.Cells(7, cnum).Value
        cstart = cnum
    if ws.Cells(7, cnum + 1).Value:
        wsHeaders[str(value)] = (cstart, cnum) #cnum is last in range
return wsHeaders

為什么win32com比xlrd慢得多？

問題描述

3 個解決方案

解決方案1
12 已采納 2010-06-03 23:46:22

解決方案2
2 2010-06-03 19:48:17

解決方案3
0 2010-06-04 14:48:34

為什么win32com比xlrd慢得多？

問題描述

3 個解決方案

解決方案1 12 已采納 2010-06-03 23:46:22

解決方案2 2 2010-06-03 19:48:17

解決方案3 0 2010-06-04 14:48:34

解決方案1
12 已采納 2010-06-03 23:46:22

解決方案2
2 2010-06-03 19:48:17

解決方案3
0 2010-06-04 14:48:34