When using Python dataclass where is the correct place to process the data for initializing the dataclass

Question

In Python,

I am using a dataclass named "MyDataClass" to store data returned by a http response. let's say the response content is a json like this and I need only the first two fields :

{
    "name": "Test1",
    "duration": 4321,
    "dont_care": "some_data",
    "dont_need": "some_more_data"
}

and now I have two options:

Option 1

resp: dict = The response's content as json
my_data_class: MyDataClass(name=resp['name'], duration=resp['duration'])

where I take advantage of the dataclass' automatically defined init method

or

Option 2

resp: dict = The response's content as json
my_data_class: MyDataClass(resp)

and leave the processing to the dataclass init method, like this:

def _ _ init _ _(self, resp: Response) -> None:
    self.name: str = resp['name']
    self.duration: int = resp['duration']

I prefer the 2nd option, but I would like to know if there is a right way to this.

Thanks.

Answer 1

You only need the 1st 2 fields for now . Until you actually end up needing more. IMO it'll be way easier to go to the Dataclass's _ _init _ _() method to take care of that. Otherwise you would have to change BOTH the function call (MyDataClass(name=.....)) AND the dataclass init. With the 2nd option you have only one place where you need to intervene.

Unless don't care/don't need is huge and you're taking performance hit because of that... premature optimization is the root of all evils. So keep it simple & flexible as long as you can!

Answer 2

Let's say in future, you want to extract more data from response and store it in Dataclass , in OPTION 1: you would need to increase the arguments for __init__ method as well as all place where you initialized Dataclass . Therefore, OPTION 2 is preferable since it reduces code redundancy and keeps data extraction logic in one place.

Answer 3

You should absolutely try to avoid overwriting a dataclass' __init__ function. There is quite a bit of magic that you'll just lose by overwriting it. Among other things, you won't be able to have a proper __post_init__ function call, unless you rewrite it yourself. Which is not trivial.

The reason why dataclass works this way is because it is supposed to be a very simple one-to-one mapping of your business data into a programmatic structure. As a consequence, every kind of additional logic that you add which has nothing to do with that core idea takes away from the usefulness of dataclass .

So I'd suggest to stick to option 1.

If writing out the wanted attributes by hand becomes too much of a nuisance, you can consider writing a classmethod that filters unwanted attributes for you, and allows you to just splat the dictionary like this:

dataclass_instance = MyDataClass.from_request(**resp)

Here is a post that explains how to do just that, where the accompanying question also touches on some of your issues.

When using Python dataclass where is the correct place to process the data for initializing the dataclass

Question

3 answers

solution1
1 2019-07-28 13:42:43

solution2
1 2019-07-28 13:52:06

solution3
0 2019-08-02 09:47:36

When using Python dataclass where is the correct place to process the data for initializing the dataclass

Question

3 answers

solution1 1 2019-07-28 13:42:43

solution2 1 2019-07-28 13:52:06

solution3 0 2019-08-02 09:47:36

solution1
1 2019-07-28 13:42:43

solution2
1 2019-07-28 13:52:06

solution3
0 2019-08-02 09:47:36