Sharing attributes between child and parent class in Python

Question

I am building a script that processes multiple data files from different providers. Since the processing for each file is more or less identical, I decided to create a parent class 'Data' containing the methods for processing. I figured this would make my code more organized, and if a new data source was added, it would be simple to add to the script.

While each file will undergo the same processing, they have different details -- directory, file format, encoding, column names, etc. Each file has the same attributes, but their contents are obviously different. These attributes are unchanging, but since there will likely be >30 attributes, I wanted to be able to hard code them in separate modules rather than pass them as aruguments to a new instance of the aforementioned 'Data' class.

My first though was to create subclasses of the parent 'data' class for each file. These subclasses would be in separate modules and have the attributes hard coded. Below is a very stripped down example:

import pandas as pd


class Data:

    def read_in(self):
        self.df = pd.read_csv(self.input_path, names = self.column_names)

    def arbitrary_process(self):
        # code interacting with self.df and other variables from Provider1/Provider2

    def save(self):
        pd.to_csv(self.output_path)

class Provider1(Data):

    input_path = "provider1.txt"
    column_names = ['A', 'B', 'C', 'D']
    # more variables will be here...
    output_path = "provider1_output.txt"


class Provider2(Data):

    path = "provider2.txt"
    column_names = ['E', 'F', 'G', 'H']
    # more variables will be here...
    output_path = "provider2_output.txt"


if __name__ == '__main__':

    # processing...
    data1 = Provider1
    data2 = Provider2

    data1.read_in()
    data2.read_in()

    data1.arbitrary_process()
    data2.arbitrary_process()

    data1.save()
    data2.save()

Right off the bat, it doesn't feel proper to have methods in the parent class referencing attributes that are only defined in the children classes. However, due to the large number of attributes, I wasn't sure if passing them as an argument to the parent's init method would be the best option.

I'm sure there is a much more elegant solution to the problem, but it isn't jumping out at me. A possible solution doesn't have to include inheritance, but my main goal is to be able to hard-code the details of the files I'll be processing.

Thanks!

Answer 1

The object-oriented way of doing this would be for your base class Data to be an abstract base class that calls, for example, method get_column_names to get a list of column names. Subclass Provider1 would implement this method by returning the appropriate list ['A', 'B', 'C', 'D']. You would have, of course, one method to be overridden for each attribute that you are currently providing in subclasses. For example:

from abc import ABCMeta, abstractmethod

class Data(metaclass=ABCMeta):

    def some_method(self):
       self.column_names = self.get_column_names()

    @abstractmethod
    def get_column_names(self):
        pass


class Provider1(Data):

   def get_column_names(self):
       return ['A', 'B', 'C', 'D']

Sharing attributes between child and parent class in Python

Question

1 answers

solution1
1 ACCPTED 2019-02-20 23:01:52

Sharing attributes between child and parent class in Python

Question

1 answers

solution1 1 ACCPTED 2019-02-20 23:01:52

solution1
1 ACCPTED 2019-02-20 23:01:52