简体   繁体   English

当属性不遵守命名规则时python中的数据类

[英]Dataclass in python when the attribute doesn't respect naming rules

If you have data like this (from a yaml file):如果您有这样的数据(来自 yaml 文件):

items:
  C>A/G>T: "#string"
  C>G/G>C: "#string"
  ...

How would load that in a dataclass that is explicit about the keys and type it has?如何将其加载到明确说明其键和类型的数据类中? Ideally I would have:理想情况下,我会:

@dataclasses.dataclass
class X:
    C>A/G>T: str
    C>G/G>C: str
...

Update:更新:

SBS_Mutations = TypedDict(
    "SBS_Mutations",
    {
        "C>A/G>T": str,
        "C>G/G>C": str,
        "C>T/G>A": str,
        "T>A/A>T": str,
        "T>C/A>G": str,
        "T>G/A>C": str,
    },
)

my_data = {....}

SBS_Mutations(my_data) # not sure how to use it here

if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.如果您想要这样的符号,它们显然不能是 Python 标识符,然后,想要使用具有属性访问权限的数据类为您提供的设施是没有意义的。

Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.只需将您的数据保存在字典或 Pandas 数据框中,这些名称可以是列标题。

Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.否则,发布一个适当的代码片段,其中包含您从何处获取数据的最小示例,然后,可以添加一个答案,一个将您的原始名称转换为有效 Python 属性名称的适当位置,并帮助构建一个动态的数据类。

This sounds like a good use case for my dotwiz library, which I have recently published.这听起来像是我最近发布的dotwiz库的一个很好的用例。 This provides a dict subclass which enables attribute-style dot access for nested keys.这提供了一个dict子类,它为嵌套键启用属性样式点访问。

As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.在最近的版本中,它提供了一个DotWizPlus实现(一个dict对象的包装器),它还对键进行大小写转换,以便它们是有效的小写 Python 标识符名称,如下所示。

# requires the following dependencies:
#   pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus

yaml_str = """
items:
  C>A/G>T: "#string"
  C>G/G>C: "#string"
"""

yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)

dw = DotWizPlus(yaml_dict)
print(dw)

assert dw.items.c_a_g_t == '#string'  # True

print(dw.to_attr_dict())

Output:输出:

{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}

NB : This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items() .注意:目前DotWiz实例访问键items时会失败,因为键名与内置属性dict.items()冲突。 I've currently submitted a bug request and hopefully work through this one edge case in particular.我目前已经提交了一个错误请求,并希望特别能解决这个边缘案例。

Type Hinting类型提示

If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus :如果您想要字段名称的类型提示或自动建议,您可以尝试这样的方法,您可以从DotWizPlus子类化:

import yaml
from dotwiz import DotWizPlus


class Item(DotWizPlus):
    c_a_g_t: str
    c_g_g_c: str

    @classmethod
    def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
        yaml_dict = loader(yaml_str)
        return cls(yaml_dict['items'])


yaml_str = """
items:
  C>A/G>T: "#string1"
  C>G/G>C: "#string2"
"""

dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')

assert dw.c_a_g_t == '#string1'  # True

# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.

Dataclasses数据类

If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard , which can help to simplify this task as well.如果您仍然希望将数据类用于类型提示,您还可以查看另一个名为dataclass-wizard 的库,它也可以帮助简化此任务。

More specifically, YAMLWizard makes it easier to load/dump a class object with YAML.更具体地说, YAMLWizard使使用 YAML 加载/转储类对象变得更加容易。 Note that this uses the PyYAML library behind the scenes by default.请注意,默认情况下,这会在后台使用PyYAML库。

Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation.请注意,在这种情况下,我无法让 case-transform 工作,因为我猜这是底层to_snake_case()实现中的一个错误。 I'm also going to submit a bug request to look into this edge case.我还将提交一个错误请求来调查这个边缘案例。 However, for now it should work if the key name in YAML is specified a bit more explicitly:但是,现在如果更明确地指定 YAML 中的键名,它应该可以工作:

from dataclasses import dataclass

from dataclass_wizard import YAMLWizard, json_field

yaml_str = """
items:
  C>A/G>T: "#string"
  C>G/G>C: "#string"
"""


@dataclass
class Container(YAMLWizard):
    items: 'Item'


@dataclass
class Item:
    c_a_g_t: str = json_field('C>A/G>T')
    c_g_g_c: str = json_field('C>G/G>C')


c = Container.from_yaml(yaml_str)
print(c)

# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'

Output:输出:

Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM