简体   繁体   中英

Sorting with xlwings (pywin32)

I need to use python to sort an excel spreadsheet by a given row. For the testing, I'm using this data (in a file named xlwings sorting.xlsx):

Numbers Letters Letters_2
7   A   L
6   B   K
5   C   M
4   D   J
3   E   N
2   F   I
1   G   H

Which should be sorted into this:

Numbers Letters Letters_2
1   G   H
2   F   I
3   E   N
4   D   J
5   C   M
6   B   K
7   A   L

One would think this to be a trivial task, but there seems to be nothing in the way of any documentation (if there is something, it's buried so deep that two days of reading hasn't uncovered it) in either the xlwings docs or the pywin32 ones regarding column sorting.

The closest thing I could find anywhere online was this question , which has no answer and just redirects to a github bug thread that had no resolution.

Still, I have managed to cobble together the following code based on the questioner's:

import xlwings as xw
from xlwings.constants import SortOrder

bk = xw.Book(r'C:\Users\username\Documents\Test Files\xlwings sorting.xlsx')

sht = bk.sheets['Sheet1']

def xl_col_sort(sht,col_num):
    sht.range('a2').api.Sort(sht.range((2,col_num)).api,SortOrder.xlAscending)
    return

xl_col_sort(sht,1)

This runs, but I have no idea how the syntax is working. I can't even tell why the first range('a2') call is necessary, but it throws an exception if I try directly calling sht.api.Sort . I tried looking directly at the code with ipython's ?? feature but it just gives me <xlwings._xlwindows.COMRetryObjectWrapper object at 0x0000001375A459E8> with no docstring. I then tried to actually ctrl+F through the .py files for the Sort() function, but ran into a dead end in a huge list of COM wrappers and couldn't track down the actual module containing the function.

At any rate, even if I haven't a clue how, the test case works; so the next step is putting this function into a class that contains an excel workbook and sheet to use the function as a method. I rewrite the code both to be used as a method and to take strings instead of column numbers (new columns are added to the middle of the worksheet frequently, so the number would change often):

class Metrics:

    # self.sheet is a sheet object based on self.book opened with xlwings
    # a bunch of other methods and attributes

    def xl_col_sort(self,col):

        # +2 because excel starts at 1 (+1) and the dataframe self.df
        # uses a data column as the index (+1)
        col_num = np.where(self.df.columns == col)[0][0] + 2

        so = xw.constants.SortOrder

        self.sheet.range('a2').api.Sort(self.sheet.range((2,col_num)).api, so.xlAscending)
        return

I can't see that anything has functionally changed, here. It's still receiving the same arguments, even if they go through an additional step to be created. Yet attempting to run this produces a MemoryError :

In[1]:    metrics.xl_col_sort('Exp. Date')
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-3-f1de8b0e8e98> in <module>()
----> 1 metrics.xl_col_sort('Exp. Date')

C:\Users\username\Documents\Projects\PyBev\pyBev_0-3-1\pybev\metricsobj.py in xl_col_sort(self, col)
    146         so = xw.constants.SortOrder
    147 
--> 148         self.sheet.range('a2').api.Sort(self.sheet.range((2,col_num)).api, so.xlAscending)
    149         return
    150     # def monday_backup(self):
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\main.py in range(self, cell1, cell2)
    818                 raise ValueError("Second range is not on this sheet")
    819             cell2 = cell2.impl
--> 820         return Range(impl=self.impl.range(cell1, cell2))
    821 
    822     @property
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\_xlwindows.py in range(self, arg1, arg2)
    576             if 0 in arg1:
    577                 raise IndexError("Attempted to access 0-based Range. xlwings/Excel Ranges are 1-based.")
--> 578             xl1 = self.xl.Cells(arg1[0], arg1[1])
    579         elif isinstance(arg1, numbers.Number) and isinstance(arg2, numbers.Number):
    580             xl1 = self.xl.Cells(arg1, arg2)
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\_xlwindows.py in __call__(self, *args, **kwargs)
    149         for i in range(N_COM_ATTEMPTS + 1):
    150             try:
--> 151                 v = self._inner(*args, **kwargs)
    152                 t = type(v)
    153                 if t is CDispatch:
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\win32com\client\dynamic.py in __call__(self, *args)
    190                 if invkind is not None:
    191                         allArgs = (dispid,LCID,invkind,1) + args
--> 192                         return self._get_good_object_(self._oleobj_.Invoke(*allArgs),self._olerepr_.defaultDispatchName,None)
    193                 raise TypeError("This dispatch object does not define a default method")
    194 
MemoryError: CreatingSafeArray

Does anyone know how the syntax of this thing works or why it's breaking when put inside the method?

This turned out to be an incredibly subtle error, so I figured I'd post the answer in case someone ends up googling this in a year trying to do something similar.

In short, the sheet.range() method only accepts coordinates that are integers, and the expression:

col_num = np.where(self.df.columns == col)[0][0] + 2

produces a floating point number. Why this produces a MemoryError instead of a syntax error is beyond me, but probably an oversight. The devs do seem to know about it, though.

Additionally, the syntax is not listed in the aforementioned docs because it is actually VBA code, as found here. The Sort() method only works on Range objects, hence the first sht.range() call requirement.

And finally, in case anyone wants a simplified function to encapsulate all this nonsense:

import xlwings as xw


bk = xw.Book(file_path)
sheet = bk.sheets['Sheet1'] # or whatever the sheet is named

def xl_col_sort(sheet,col_num):
    sheet.range((2,col_num)).api.Sort(Key1=sheet.range((2,col_num)).api, Order1=1)
return

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM