简体   繁体   English

创建一个基于 Pandas.DataFrame 的类,使用 pandas.read_csv() 函数进行初始化

[英]Creating a class based on Pandas.DataFrame using the pandas.read_csv() function to initialize

My goal is to create an object that behaves the same as a Pandas DataFrame, but with a few extra methods of my own on top of it.我的目标是创建一个行为与 Pandas DataFrame 相同的对象,但在它之上有一些我自己的额外方法。 As far as I understand, one approach would be to extend the class, which I first tried to do as follows:据我了解,一种方法是扩展类,我首先尝试按如下方式进行:

class CustomDF(pd.DataFrame):
    def  __init__(self, filename):
        self = pd.read_csv(filename)

But I get errors when trying to view this object, saying: 'CustomDF' object has no attribute '_data' .但是在尝试查看此对象时出现错误,说: 'CustomDF' object has no attribute '_data'

My second iteration was to instead not inherit the object, but rather import it as a DataFrame into one of the object attributes, and have the methods work around it, like this:我的第二次迭代不是继承对象,而是将其作为 DataFrame 导入对象属性之一,并让方法解决它,如下所示:

class CustomDF():

    def  __init__(self, filename):
        self.df = pd.read_csv(filename)

    def custom_method_1(self,a,b,...):
        ...

    def custom_method_2(self,a,b,...):
        ...

This is fine, except that for all custom methods, I need to access the self.df attribute first to do anything on it, but I would prefer that my custom dataframe were just self .这很好,除了对于所有自定义方法,我需要首先访问self.df属性以对其执行任何操作,但我更希望我的自定义数据框只是self

Is there a way that this can be done?有没有办法做到这一点? Or is this approach not ideal anyway?还是这种方法并不理想?

The __init__ method is overwritten in your first example. __init__方法在您的第一个示例中被覆盖。

Use super and then add your custom code使用super然后添加您的自定义代码

class CustomDF(pd.DataFrame):
    def __init__(self, *args, **kw):
        super(CustomDF, self).__init__(*args, **kw)
        # Your code here

    def custom_method_1(self,a,b,...):
        ...

Is this what you were looking for?这就是你要找的吗?

class CustomDF:

    def  __init__(self):
        self.df = pd.read_csv(filename)

    def custom_method_1(self, *args, **kwargs):
        result_1 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_1

    def custom_method_2(self, *args, **kwargs):
        result_2 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_2

    ...

I would probably go with the decorator pattern here.我可能会在这里使用装饰器模式 The accepted answer for this post will put you on the right track. 这篇文章的公认答案将使您走上正轨。

I see that your first iteration would be really cool, but it seem to me you need to know quite a lot of stuff about Pandas' internals, eg, that this _data attribute need to be set in a certain way.我看到你的第一次迭代会很酷,但在我看来你需要了解很多关于 Pandas 内部的东西,例如,这个_data属性需要以某种方式设置。

Cheers.干杯。

In my project I did something similar and use decorators, like manu suggested.在我的项目中,我做了类似的事情并使用了装饰器,就像 manu 建议的那样。 The decorator @property might work for you, it basically turns the method .df() into a property .df .装饰器@property可能对您.df() ,它基本上将方法.df()转换为属性.df Therefore it will only be read in when it's called specifically.因此它只会在被特别调用时被读入。 But this only works on instances of the class.但这仅适用于类的实例。

class CustomDF:
    
    @property
    def df(self):
        return pd.read_csv(filename)

    def custom_method_1(self, *args, **kwargs):
        result_1 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_1

    def custom_method_2(self, *args, **kwargs):
        result_2 = do_custom_operations_on(self.df, *args, **kwargs)
        return result_2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM