简体   繁体   中英

When using Python dataclass where is the correct place to process the data for initializing the dataclass

In Python,

I am using a dataclass named "MyDataClass" to store data returned by a http response. let's say the response content is a json like this and I need only the first two fields :

{
    "name": "Test1",
    "duration": 4321,
    "dont_care": "some_data",
    "dont_need": "some_more_data"
}

and now I have two options:

Option 1

resp: dict = The response's content as json
my_data_class: MyDataClass(name=resp['name'], duration=resp['duration'])

where I take advantage of the dataclass' automatically defined init method

or

Option 2

resp: dict = The response's content as json
my_data_class: MyDataClass(resp)

and leave the processing to the dataclass init method, like this:

def _ _ init _ _(self, resp: Response) -> None:
    self.name: str = resp['name']
    self.duration: int = resp['duration']

I prefer the 2nd option, but I would like to know if there is a right way to this.

Thanks.

You only need the 1st 2 fields for now . Until you actually end up needing more. IMO it'll be way easier to go to the Dataclass's _ _init _ _() method to take care of that. Otherwise you would have to change BOTH the function call (MyDataClass(name=.....)) AND the dataclass init. With the 2nd option you have only one place where you need to intervene.

Unless don't care/don't need is huge and you're taking performance hit because of that... premature optimization is the root of all evils. So keep it simple & flexible as long as you can!

Let's say in future, you want to extract more data from response and store it in Dataclass , in OPTION 1: you would need to increase the arguments for __init__ method as well as all place where you initialized Dataclass . Therefore, OPTION 2 is preferable since it reduces code redundancy and keeps data extraction logic in one place.

You should absolutely try to avoid overwriting a dataclass' __init__ function. There is quite a bit of magic that you'll just lose by overwriting it. Among other things, you won't be able to have a proper __post_init__ function call, unless you rewrite it yourself. Which is not trivial.

The reason why dataclass works this way is because it is supposed to be a very simple one-to-one mapping of your business data into a programmatic structure. As a consequence, every kind of additional logic that you add which has nothing to do with that core idea takes away from the usefulness of dataclass .

So I'd suggest to stick to option 1.


If writing out the wanted attributes by hand becomes too much of a nuisance, you can consider writing a classmethod that filters unwanted attributes for you, and allows you to just splat the dictionary like this:

dataclass_instance = MyDataClass.from_request(**resp)

Here is a post that explains how to do just that, where the accompanying question also touches on some of your issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM