繁体   English   中英

Wrapper for pandas function of a dataframe, which reads a csv from file as the dataframe - df not defined error

[英]Wrapper for pandas function of a dataframe, which reads a csv from file as the dataframe - df not defined error

我经常编写在 dataframe 上工作的函数,以及额外的 arguments。 I'd like to write a general function that I can wrap around this sort of function, which will load a.csv file as a dataframe, then use that dataframe in the function. I'd like to have the option to also save the output as another.csv file in some cases, giving the function a file location at which to save the.csv.

我遇到的问题是,这不是一个装饰器 function,因为它包含额外的参数,即文件位置(用于加载 a.csv,有时用于保存)。 But I also don't want to have to write this function uniquely for every function I want to do this with (in which case I just pass all arguments of the contained function to the wrapping function).

我目前的尝试如下。 我在 jupyter notebook 中运行它,所以它只是将 .csv 保存在主目录中并从那里加载它。

import pandas as pd

a=[1,2,3,4]
b=[5,3,7,2]
testdf=pd.DataFrame(list(zip(a,b)),columns=['A','B'])

file_in_location='test.csv'
testdf.to_csv(file_in_location)

def open_file_and_run_wrapper(func,file_in_location,file_out_location='',save_output=False,delimiter=','):
    '''
    Function that opens a file as a dataframe and runs it through the given function
    '''
    if save_output==True:
        if file_out_location=='':
            # raise exception
            print('error: must have file output location')

    df=pd.read_csv(file_in_location,delimiter=delimiter)

    if save_output==True:
        df.to_csv(file_out_location,delimiter=delimiter)

    return func(df=df,*args,**kwargs)

def df_function(df,add_colname,value):
    df[add_colname]=value
    return df

open_file_and_run_wrapper(
    df_function(df,'C',4),
    file_in_location,
)

这将返回以下错误:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-d174cd4d8bbc> in <module>
     29 
     30 open_file_and_run_wrapper(
---> 31     df_function(df,'C',4),
     32     file_in_location,
     33 )

NameError: name 'df' is not defined

这并不奇怪,因为当我开始运行这个 function 时,没有定义 dataframe。 但是,它将由包装器 function 定义。 如何创建允许附加参数的通用包装器/装饰器 function?

以下是编写(和调用)包装器的方法:

# notice the additional *args and **kwargs
def open_file_and_run_wrapper(func, file_in_location,
                              *args,                 
                              file_out_location='',
                              save_output=False, 
                              delimiter=',', **kwargs):
    '''
    Function that opens a file as a dataframe and runs it through the given function
    '''
    if save_output==True:
        if file_out_location=='':
            # raise exception
            print('error: must have file output location')

    df=pd.read_csv(file_in_location,delimiter=delimiter)

    if save_output==True:
        df.to_csv(file_out_location,delimiter=delimiter)

    # note how we pass the additional parameters
    # in `df_function` `df` is not a keyword argument
    # we call it as such
    return func(df,*args,**kwargs)

def df_function(df,add_colname,value):
    df[add_colname]=value
    return df

现在,我们可以使用附加参数作为关键字 arguments 调用包装器

open_file_and_run_wrapper(
    df_function, 
    file_in_location,
    add_colname='C', value=4
)

或者我们也可以使用位置 arguments 调用,但这会不太可读

open_file_and_run_wrapper(
    df_function, 
    file_in_location,
    'C', 4       # positional arguments here
)

Output:

   Unnamed: 0  A  B  C
0           0  1  5  4
1           1  2  3  4
2           2  3  7  4
3           3  4  2  4

您可以像这样处理它,您将 function 作为 object 传递,然后将位置 arguments 和关键字 ZDBC11CAA4BD5BDA9E7D77E 传递为列表-FB5BDA9E7D7766。 它看起来像这样:

def open_file_and_run_wrapper(
    func,
    file_in_location,
    func_args=[],
    func_kwargs={},
    file_out_location=None,
    delimiter=",",
):
    """
    Function that opens a file as a dataframe and runs it through the given function
    """

    df = pd.read_csv(file_in_location, delimiter=delimiter)
    processed_df = func(df, *func_args, **func_kwargs)

    if file_out_location is not None:
        processed_df.to_csv(file_out_location, delimiter=delimiter)

    return processed_df


def df_function(df, add_colname, value):
    df[add_colname] = value
    return df


open_file_and_run_wrapper(
    df_function, file_in_location, func_args=["C"], func_kwargs={"value": 5}
)

我已经对您的代码进行了一些更改,所以希望我没有改变您的期望。

  • func_args接受一个列表或元组(实际上是任何序列),然后作为位置 arguments 传递给 function
  • func_kwargs接受类似字典的参数并作为关键字 arguments 传递给 function
  • 删除save_output以检查是否存在file_out_location以保存 function 的 output(如果没有提供file_out_location ,则没有 Z78E6221F6393D1356681DB398F14 保存为文件)。
  • 将调用移动到to_csv以保存新创建的 dataframe 而不是保存从文件中读取的相同 dataframe

您想要的是 object,而不是 function

class DataWrapper:

    def run(self, df):
        raise NotImplementedError

    def open_and_run(self, file_in_location, delimiter=','):
        df = pd.read_csv(file_in_location, delimiter=delimiter)
        return self.run(df)

    def open_run_and_save(self, file_in_location, file_out_location,  delimiter=','):
        df_result = self.open_and_run(file_in_location, delimiter)
        df_result.to_csv(file_out_location, delimiter=delimiter)

您的包装函数将在 run 方法中实现,参数将在初始化程序上传递

class AddConstantColumnWrapper(DataWrapper):

    def __init__(self, colname, value):
        super().__init__()
        self.colname = colname
        self.value = value
 
    def run(self, df):
        df[self.colname] = self.value
        return df

然后你可以调用 object 来执行你需要的

wrapper = AddConstantColumnWrapper('C',4)
df_result = wrapper.open_and_run(file_in_location)

将参数字典作为参数传递通常表明需要 object 方向

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM