简体   繁体   English

开箱**kwargs

[英]Unpacking **kwargs

I am working on a project that involves a lot of database filtering using Pandas.我正在做一个涉及大量使用 Pandas 进行数据库过滤的项目。 So I wrote the following function:于是我写了下面的function:

def filterList(df, dropL, col, criteria, reason="", strCont=False, isIN=False,
                   notEq=False, isEq=False, isNAN=False, isDup=False, useDropL=True, 
                   dropCol=False, dropColDropList=False, useDropReason=True):

    # make a mask
    if strCont:
        mask = df[col].str.contains(criteria)
    elif notEq:
        mask = df[col] != criteria
    elif isEq:
        mask = df[col] == criteria
    elif isNAN:
        mask = np.isnan(df[col])
    elif isIN:
        mask = df[col].isin(criteria)
    elif isDup:
        mask = df.duplicated(col, keep=False)
    else:
        print("you must specify how to make the mask")
        sys.exit()

    # fill the droplist
    if useDropL:
        dropL = dropL.append(df[mask]).fillna("")
        dropL.reset_index(drop=True, inplace=True)
        if useDropReason:
            dropL.loc[dropL["Reason Dropped"] == '', 'Reason Dropped'] = reason
        if dropColDropList:
            dropL.drop(col, axis='columns', inplace=True)

    # filter the list
    df_Filtered = df.drop(df[mask].index)
    df_Filtered.reset_index(drop=True, inplace=True)

    # special instructions
    if dropCol:
        df_Filtered.drop(col, axis='columns', inplace=True)

    return df_Filtered, dropL

It's setup such that I have to pass one of the boolean variables as true in order to specify how the matching criteria should be compared to the specific column.它的设置使得我必须将 boolean 变量之一作为 true 传递,以便指定如何将匹配标准与特定列进行比较。 It also tracks the dropped items and fills in a reason why that item was dropped (for error manual error checking later).它还跟踪丢弃的项目并填写该项目被丢弃的原因(用于稍后的错误手动错误检查)。

I would like to not have such a long declaration statement.我不想有这么长的声明声明。 I mean, it works, I just think it looks ugly.我的意思是,它有效,我只是觉得它看起来很丑。

So I figured that I could use **kwargs to capture all the bools, and then just look for the variable names in them, but everywhere I look to see how to do that is saying that this is the worst idea in the world.所以我想我可以使用**kwargs来捕获所有布尔值,然后只在其中查找变量名,但是我到处寻找如何做到这一点的地方都在说这是世界上最糟糕的想法。

The given reasons seem to revolve around not knowing what variables will be passed, and possible variable name collisions.给定的原因似乎围绕着不知道将传递哪些变量以及可能的变量名冲突。 But I'm the only one who will be writing or running this code, so I'm not worried about variable name collisions in this case.但我是唯一将编写或运行此代码的人,所以在这种情况下我不担心变量名冲突。

So所以

  1. is my situation an acceptable one to directly cast the kwarg keys as variable names?我的情况是否可以接受直接将 kwarg 键转换为变量名?

and

  1. if so/not, how (else) would I go about this?如果是这样/不是,我将如何(否则)go 关于这个? (I'm not at all familiar with kwargs, and only slightly familiar with dictionaries, which I understand kwargs is) (我对kwargs一点都不熟悉,对字典也只是稍微熟悉,我理解kwargs就是)

Since the filtering criteria are mutually exclusive, you should just use a single parameter that specifies the filtering method, rather than lots of boolean parameters.由于过滤条件是互斥的,因此您应该只使用一个指定过滤方法的参数,而不是使用大量的 boolean 参数。

def filterList(df, dropL, col, filterType, reason="", useDropL=True, 
               dropCol=False, dropColDropList=False, useDropReason=True):
    if filterType == "strCont":
        mask = df[col].str.contains(criteria)
    elif filterType == "notEq":
        mask = df[col] != criteria
    ...
    else:
        print("you must specify how to make the mask")
        sys.exit()
    ...

Not addressing your specific use-case, but there will be times when it's necessary to have a function that can take a whole lot of arguments, and those must be specific.没有解决您的特定用例,但有时需要有一个 function 可以占用大量 arguments,而且这些必须是特定的。 Using kwargs isn't the worst idea in the world, but it would create two problems that you have to solve:使用kwargs并不是世界上最糟糕的想法,但它会产生两个你必须解决的问题:

  1. The programmer / user that will use that function would not have any indication / documentation about what the function is expecting to get, what values can be passed etc.将使用 function 的程序员/用户不会有任何关于 function 期望得到什么、可以传递什么值等的指示/文档。
  2. You would have to setup default arguments yourself, if those don't exist in the kwargs object, and you would also have to handle which items are optional and which are required.如果kwargs object 中不存在这些,您必须自己设置默认 arguments,并且您还必须处理哪些项目是可选的,哪些是必需的。

Having said that, readability is also a factor, and you are right to be concerened about the "ugliness" of a large decleration statement.话虽如此,可读性也是一个因素,您对大型声明的“丑陋”感到担忧是正确的。 To solve that, I think it will be smarter to just change the format of the decletation.为了解决这个问题,我认为只是改变 decletation 的格式会更聪明。 Writing something like this is totally acceptable, and much more readable:写这样的东西是完全可以接受的,而且更具可读性:

def filterList(df,
               dropL,
               col,
               criteria,
               reason="",
               strCont=False,
               isIN=False,
               notEq=False, 
               isEq=False, 
               isNAN=False,
               isDup=False,
               useDropL=True, 
               dropCol=False, 
               dropColDropList=False,
               useDropReason=True):

And that format would even make it easier to add comments or type-hints, if needed, to each variable如果需要,这种格式甚至可以更容易地为每个变量添加注释或类型提示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM