简体   繁体   中英

Unpacking **kwargs

I am working on a project that involves a lot of database filtering using Pandas. So I wrote the following function:

def filterList(df, dropL, col, criteria, reason="", strCont=False, isIN=False,
                   notEq=False, isEq=False, isNAN=False, isDup=False, useDropL=True, 
                   dropCol=False, dropColDropList=False, useDropReason=True):

    # make a mask
    if strCont:
        mask = df[col].str.contains(criteria)
    elif notEq:
        mask = df[col] != criteria
    elif isEq:
        mask = df[col] == criteria
    elif isNAN:
        mask = np.isnan(df[col])
    elif isIN:
        mask = df[col].isin(criteria)
    elif isDup:
        mask = df.duplicated(col, keep=False)
    else:
        print("you must specify how to make the mask")
        sys.exit()

    # fill the droplist
    if useDropL:
        dropL = dropL.append(df[mask]).fillna("")
        dropL.reset_index(drop=True, inplace=True)
        if useDropReason:
            dropL.loc[dropL["Reason Dropped"] == '', 'Reason Dropped'] = reason
        if dropColDropList:
            dropL.drop(col, axis='columns', inplace=True)

    # filter the list
    df_Filtered = df.drop(df[mask].index)
    df_Filtered.reset_index(drop=True, inplace=True)

    # special instructions
    if dropCol:
        df_Filtered.drop(col, axis='columns', inplace=True)

    return df_Filtered, dropL

It's setup such that I have to pass one of the boolean variables as true in order to specify how the matching criteria should be compared to the specific column. It also tracks the dropped items and fills in a reason why that item was dropped (for error manual error checking later).

I would like to not have such a long declaration statement. I mean, it works, I just think it looks ugly.

So I figured that I could use **kwargs to capture all the bools, and then just look for the variable names in them, but everywhere I look to see how to do that is saying that this is the worst idea in the world.

The given reasons seem to revolve around not knowing what variables will be passed, and possible variable name collisions. But I'm the only one who will be writing or running this code, so I'm not worried about variable name collisions in this case.

So

  1. is my situation an acceptable one to directly cast the kwarg keys as variable names?

and

  1. if so/not, how (else) would I go about this? (I'm not at all familiar with kwargs, and only slightly familiar with dictionaries, which I understand kwargs is)

Since the filtering criteria are mutually exclusive, you should just use a single parameter that specifies the filtering method, rather than lots of boolean parameters.

def filterList(df, dropL, col, filterType, reason="", useDropL=True, 
               dropCol=False, dropColDropList=False, useDropReason=True):
    if filterType == "strCont":
        mask = df[col].str.contains(criteria)
    elif filterType == "notEq":
        mask = df[col] != criteria
    ...
    else:
        print("you must specify how to make the mask")
        sys.exit()
    ...

Not addressing your specific use-case, but there will be times when it's necessary to have a function that can take a whole lot of arguments, and those must be specific. Using kwargs isn't the worst idea in the world, but it would create two problems that you have to solve:

  1. The programmer / user that will use that function would not have any indication / documentation about what the function is expecting to get, what values can be passed etc.
  2. You would have to setup default arguments yourself, if those don't exist in the kwargs object, and you would also have to handle which items are optional and which are required.

Having said that, readability is also a factor, and you are right to be concerened about the "ugliness" of a large decleration statement. To solve that, I think it will be smarter to just change the format of the decletation. Writing something like this is totally acceptable, and much more readable:

def filterList(df,
               dropL,
               col,
               criteria,
               reason="",
               strCont=False,
               isIN=False,
               notEq=False, 
               isEq=False, 
               isNAN=False,
               isDup=False,
               useDropL=True, 
               dropCol=False, 
               dropColDropList=False,
               useDropReason=True):

And that format would even make it easier to add comments or type-hints, if needed, to each variable

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM