简体   繁体   English

dict 属性 'type' 到 select 数据类的子类

[英]dict attribute 'type' to select Subclass of dataclass

I have the following class我有以下 class

@dataclass_json
@dataclass
class Source:
    type: str =None
    label: str =None
    path: str = None

and the two subclasses:和两个子类:

@dataclass_json
@dataclass
class Csv(Source):
    csv_path: str=None
    delimiter: str=';'

and

@dataclass_json
@dataclass
class Parquet(Source):
    parquet_path: str=None

Given now the dictionary:现在给定字典:

parquet={type: 'Parquet', label: 'events', path: '/.../test.parquet', parquet_path: '../../result.parquet'}
csv={type: 'Csv', label: 'events', path: '/.../test.csv', csv_path: '../../result.csv', delimiter:','}

Now I would like to do something like现在我想做类似的事情

Source().from_dict(csv) 

and that the output will be the class Csv or Parquet.并且 output 将是 class Csv 或 Parquet。 I understand that if you initiate the class source you just "upload" the parameters with the method "from dict", but is there any posibility in doing this by some type of inheritence without using a "Constructor" which makes a if-else if-else over all possible 'types'?我知道,如果您启动 class 源,您只需使用“来自 dict”的方法“上传”参数,但是在不使用“构造函数”的情况下,是否有可能通过某种类型的继承来执行此操作,这会产生 if-else if - 其他所有可能的“类型”?

Pureconfig, a Scala Library, creates different case classes when the attribute 'type' has the name of the desired subclass. Pureconfig 是一个 Scala 库,当属性“类型”具有所需子类的名称时,它会创建不同的案例类。 In Python this is possible?在 Python 这可能吗?

You can build a helper that picks and instantiates the appropriate subclass.您可以构建一个帮助器来选择并实例化适当的子类。

def from_data(data: dict, tp: type):
    """Create the subtype of ``tp`` for the given ``data``"""
    subtype = [
        stp for stp in tp.__subclasses__()  # look through all subclasses...
        if stp.__name__ == data['type']     # ...and select by type name
    ][0]
    return subtype(**data)  # instantiate the subtype

This can be called with your data and the base class from which to select:这可以使用您的数据和基础 class 调用,从中调用 select:

>>> from_data(
...     {'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','},
...     Source,
... )
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')

If you need to run this often, it is worth building a dict to optimise the subtype lookup.如果您需要经常运行它,则值得构建一个dict来优化子类型查找。 A simple means is to add a method to your base class, and store the lookup there:一种简单的方法是向您的基础 class 添加一个方法,并将查找存储在那里:

@dataclass_json
@dataclass
class Source:
    type: str =None
    label: str =None
    path: str = None

    @classmethod
    def from_data(cls, data: dict):
        if not hasattr(cls, '_lookup'):
            cls._lookup = {stp.__name__: stp for stp in cls.__subclasses__()}
        return cls._lookup[data["type"]](**data)

This can be called directly on the base class:这可以直接在基础 class 上调用:

>>> Source.from_data({'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','})
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')

This is a variation on my answer to this question .这是我对这个问题的回答的一个变体。

@dataclass_json
@dataclass
class Source:
    type: str = None
    label: str = None
    path: str = None

    def __new__(cls, type=None, **kwargs):
        for subclass in cls.__subclasses__():
            if subclass.__name__ == type:
                break
        else:
            subclass = cls
        instance = super(Source, subclass).__new__(subclass)
        return instance

assert type(Source(**csv)) == Csv
assert type(Source(**parquet)) == Parquet
assert Csv(**csv) == Source(**csv)
assert Parquet(**parquet) == Source(**parquet)

You asked and I am happy to oblige.你问了,我很乐意答应。 However, I'm questioning whether this is really what you need.但是,我质疑这是否真的是你需要的。 I think it might be overkill for your situation.我认为这对你的情况来说可能是矫枉过正。 I originally figured this trick out so I could instantiate directly from data when...我最初想出了这个技巧,所以我可以直接从数据中实例化......

  • my data was heterogeneous and I didn't know ahead of time which subclass was appropriate for each datum,我的数据是异构的,我事先不知道哪个子类适合每个数据,
  • I didn't have control over the data, and我无法控制数据,并且
  • figuring out which subclass to use required some processing of the data, processing which I felt belonged inside the class (for logical reasons as well as to avoid polluting the scope in which the instantiating took place).弄清楚要使用哪个子类需要对数据进行一些处理,我认为这些处理属于 class 内部(出于逻辑原因以及避免污染发生实例化的 scope)。

If those conditions apply to your situation, then I think this is a worth-while approach.如果这些条件适用于您的情况,那么我认为这是一种值得的方法。 If not, the added complexity of mucking with __new__ -- a moderately advanced maneuver -- might not outweigh the savings in complexity in the code used to instantiate.如果不是这样,使用__new__增加的复杂性 - 一个中等高级的操作 - 可能不会超过用于实例化的代码复杂性的节省。 There are probably simpler alternatives.可能有更简单的选择。

For example, it appears as though you already know which subclass you need;例如,您似乎已经知道需要哪个子类; it's one of the fields in the data.它是数据中的字段之一。 If you put it there, presumably whatever logic you wrote to do so could be used to instantiate the appropriate subclass right then and there, bypassing the need for my solution.如果你把它放在那里,大概你写的任何逻辑都可以用来在当时和那里实例化适当的子类,绕过对我的解决方案的需求。 Alternatively, instead of storing the name of the subclass as a string, store the subclass itself.或者,不要将子类的名称存储为字符串,而是存储子类本身。 Then you could do this: data['type'](**data)然后你可以这样做: data['type'](**data)

It also occurs to me that maybe you don't need inheritance at all.我还想到,也许您根本不需要 inheritance。 Do Csv and Parquet store the same type of data, differing only in which file format they read it from? CsvParquet是否存储相同类型的数据,只是它们读取的文件格式不同? Then maybe you just need one class with from_csv and from_parquet methods.那么也许你只需要一个带有from_csvfrom_parquet方法的 class 。 Alternatively, if one of the parameters is a filename, it would be easy to figure out which type of file parsing you need based on the filename extension.或者,如果其中一个参数是文件名,则很容易根据文件扩展名确定您需要哪种类型的文件解析。 Normally I'd put this in __init__ , but since you're using dataclass , I guess this would happen in __post_init__ .通常我会把它放在__init__中,但是由于您使用的是dataclass ,我想这会发生在__post_init__中。

Do you need this behavior?你需要这种行为吗?

from dataclasses import dataclass
from typing import Optional, Union, List

from validated_dc import ValidatedDC


@dataclass
class Source(ValidatedDC):
    label: Optional[str] = None
    path: Optional[str] = None


@dataclass
class Csv(Source):
    csv_path: Optional[str] = None
    delimiter: str = ';'


@dataclass
class Parquet(Source):
    parquet_path: Optional[str] = None


@dataclass
class InputData(ValidatedDC):
    data: List[Union[Parquet, Csv]]


# Let's say you got a json-string and loaded it:
data = [
    {
        'label': 'events', 'path': '/.../test.parquet',
        'parquet_path': '../../result.parquet'
    },
    {
        'label': 'events', 'path': '/.../test.csv',
        'csv_path': '../../result.csv', 'delimiter': ','
    }

]


input_data = InputData(data=data)

for item in input_data.data:
    print(item)

# Parquet(label='events', path='/.../test.parquet', parquet_path='../../result.parquet')
# Csv(label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')

validated_dc: https://github.com/EvgeniyBurdin/validated_dc验证的_dc:https://github.com/EvgeniyBurdin/validated_dc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM