简体   繁体   中英

Dataclass in python when the attribute doesn't respect naming rules

If you have data like this (from a yaml file):

items:
  C>A/G>T: "#string"
  C>G/G>C: "#string"
  ...

How would load that in a dataclass that is explicit about the keys and type it has? Ideally I would have:

@dataclasses.dataclass
class X:
    C>A/G>T: str
    C>G/G>C: str
...

Update:

SBS_Mutations = TypedDict(
    "SBS_Mutations",
    {
        "C>A/G>T": str,
        "C>G/G>C": str,
        "C>T/G>A": str,
        "T>A/A>T": str,
        "T>C/A>G": str,
        "T>G/A>C": str,
    },
)

my_data = {....}

SBS_Mutations(my_data) # not sure how to use it here

if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.

Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.

Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.

This sounds like a good use case for my dotwiz library, which I have recently published. This provides a dict subclass which enables attribute-style dot access for nested keys.

As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.

# requires the following dependencies:
#   pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus

yaml_str = """
items:
  C>A/G>T: "#string"
  C>G/G>C: "#string"
"""

yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)

dw = DotWizPlus(yaml_dict)
print(dw)

assert dw.items.c_a_g_t == '#string'  # True

print(dw.to_attr_dict())

Output:

{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}

NB : This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items() . I've currently submitted a bug request and hopefully work through this one edge case in particular.

Type Hinting

If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus :

import yaml
from dotwiz import DotWizPlus


class Item(DotWizPlus):
    c_a_g_t: str
    c_g_g_c: str

    @classmethod
    def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
        yaml_dict = loader(yaml_str)
        return cls(yaml_dict['items'])


yaml_str = """
items:
  C>A/G>T: "#string1"
  C>G/G>C: "#string2"
"""

dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')

assert dw.c_a_g_t == '#string1'  # True

# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.

Dataclasses

If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard , which can help to simplify this task as well.

More specifically, YAMLWizard makes it easier to load/dump a class object with YAML. Note that this uses the PyYAML library behind the scenes by default.

Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation. I'm also going to submit a bug request to look into this edge case. However, for now it should work if the key name in YAML is specified a bit more explicitly:

from dataclasses import dataclass

from dataclass_wizard import YAMLWizard, json_field

yaml_str = """
items:
  C>A/G>T: "#string"
  C>G/G>C: "#string"
"""


@dataclass
class Container(YAMLWizard):
    items: 'Item'


@dataclass
class Item:
    c_a_g_t: str = json_field('C>A/G>T')
    c_g_g_c: str = json_field('C>G/G>C')


c = Container.from_yaml(yaml_str)
print(c)

# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'

Output:

Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM