简体   繁体   中英

Python design pattern for magic methods and pandas inheritance

So I basically put this question for some advice

I have few classes which basically does some pandas operations and return a dataframe. But these dataframes needs addition and subtraction on some filter options. I planned to write a class to override __add__ and __sub__ methods, so that these dataframes are added or subtracted by my code which implements those filters. Below is a basic structure

import ArithOperation
class A:
  def dataA(self, filenameA):
    dfa = pd.read_excel(filenameA)
    return ArithOperation(dfA)

class B:
  def dataB(self, filenameB):
    dfb = pd.read_excel(filenameB)
    return ArithOperation(dfB)

dfA and dfB here are pandas dataframes.

class ArithOperation:
  def __init__(self, df):
     self.df = df
  def __add__(self, other):
     # here the filtering and manual addition is done
     return ArithOperation(sumdf)
  def __sub__(self, other):
     # here the filtering and manual subtraction is done
     return ArithOperation(deltadf)

Basically I do the calculation as below

dfa = A().dataA()
dfb = B().dataB()

sumdf = dfa+dfb
deltadf = dfa-dfb

But how do I make the sumdf and deltadf have default dataframe functions too. I know I should inherit dataframe class to ArithOperation but I am confused and bit uncomfortable of instantiating ArithOperation() at many places.

  1. Is there a better design pattern for this problem?
  2. Which class to inherit on ArithOperation so that I have all pandas dataframe functions too on ArithOperation object ?

It looks to me you want to customize the behaviour of an existent type (dataframes in this case, but it could be any type). Usually we want to alter some of the behaviour to a different one, or we want to extend that behaviour, or we want to do both. When I say behaviour think methods).

You can also choose between a has-a approach (that is wrapping an object, by creating a new class whose objects hold a reference to the original object. That way you can create several new, different or similar methods that make new things, eventually using some of the existing ones, by using the stored reference to invoke original methods. This way you kind of adapt the original class interface to a different one. This is known as a wrapper pattern (or adapter pattern).

That is what you have made. But then you face a problem: how do you accept all of the possible methods of the original class? You will have to rewrite all the possible methods (not pratical), just to delegate them on the original class, or you find a way of delegating them all except the few ones you override. I will not cover this last possibility, because you have inheritance at your disposal and that makes things like that quite straightforward.

Just inherit from the original class and you'll be able to create objects with the same behaviour as the parent class. If you need new data members, override __init__ and add those but don't forget to invoke the parent's class __init__ , otherwise the object won't initialize properly. That's where you use super().__init__() like described below. If you want to add new methods, just add the usual defs to your child class. If you want to extend existant methods, do as described for the __init__ method. If you want to completely override a method just write your version with a def with the same name of the original.method, it will totally replace the original.

So in your case you want to inherit from dataframe class, you'll have to either choose if you need a custom __init__ or not. If you need define one but do not forget to call the parent's original inside it. Then define your custom methods, say a new __add__ and __sub__ , that either replace or extend the original ones (with the same technique). Be careful in not defining methods that you think are new, but actually existed in the original class, cause they will be overriden. This is an small inconvenience of inheriting, specially if the original has an interface with lots of methods.

Use super() to extend a parent's class behaviour

class A:
    pass # has some original __init__

class B(A): # B inherits A
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs) # preserves parent initialization, passing the same received arguments. If you don't do this parent __init__ will be overrided, not initialising parent object properly.
        # add your custom code
        pass

The thing with super() is that it solves some problems that arise when using multiple inheritance (inherit from several classes), when determining which of the parents method should be called, if you have several with the same named method; it is recommended over calling the parent's method with SomeParentClass.method() inside the child class.

Also if you subclass a type because you need it, you shouldn't be afraid of using it "everywhere" as you said. Make sure you customize it for a good reason (so think if making this to dataframes is appropriate, or is there another simpler way of achieving the same objective without doing that; can't advise you here, I haven't experience with pandas), then use it instead of the original class.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM