简体   繁体   English

按字段值快速定位包含数据类列表的数据类中的项目

[英]Quickly locate an item in a dataclass containing a list of dataclasses by field value

I have a dataclass with this structure:我有一个具有这种结构的数据类:

from dataclasses import dataclass
from typing import List

@dataclass
class PartData:
    id: int = 0
    name: str = None
    value: int = 0

@dataclass
class StockData:
    stock_1: List[PartData] = None
    stock_2: List[PartData] = None
    def __getitem__(self, key):
        return super().__getattribute__(key)

Now I create the dataclasses and fill them with items:现在我创建数据类并用项目填充它们:

PARTS = [{"id": 1, "name": "screw"}, {"id": 3, "name": "bolt"}, {"id": 42, "name": "glue"}, {"id": 11, "name": "nail"}, {"id": 31, "name": "hammer"}, {"id": 142, "name": "paper"}]

dc_stock = StockData()

for p in PARTS:
    dc_part = PartData()
    dc_part.id = p["id"]
    if dc_part.id % 2 == 0:
        dc_stock_list = "stock_1"
    else:
        dc_stock_list = "stock_2"
    if getattr(dc_stock, dc_stock_list) == None:
        setattr(dc_stock, dc_stock_list, [dc_part])
    else:
        dc_stock[dc_stock_list].append(dc_part)

print(dc_stock)
# StockData(stock_1=[PartData(id=42, name=None, value=0), PartData(id=142, name=None, value=0)], 
#           stock_2=[PartData(id=1, name=None, value=0), PartData(id=3, name=None, value=0), PartData(id=11, name=None, value=0), PartData(id=31, name=None, value=0)]) 

I know I can loop over all items and compare them, but can I define a method that takes part_id as an argument and can update any item in dc_stock with that part_id with a new value ?我知道我可以遍历所有项目并比较它们,但是我可以定义一个方法,将part_id作为参数,并可以用新value更新dc_stock中的任何part_id吗? Can this be implemented as a method of StockData ?这可以作为StockData的一种方法来实现吗? Suppose I do not know if the part is in stock_1 or stock_2 .假设我不知道零件是在stock_1还是stock_2中。

Edit编辑

For better understanding I want to share my approach, which looks very loopy and costy to me:为了更好地理解,我想分享我的方法,这对我来说看起来非常循环和昂贵:

@dataclass
class StockData:
    stock_1: List[PartData] = None
    stock_2: List[PartData] = None

    def __getitem__(self, key):
        return super().__getattribute__(key)

    def update_part(self, id, value):
        for stock_list in [f for f in fields(self) if f.name.startswith("stock")]:
            stock = getattr(self, stock_list.name)
            if len(stock) > 0:
                for part in stock:
                    if part.id == id:
                        part.value = value
                        return None


print(dc_stock)
dc_stock.update_part(1, 10)
print(dc_stock)

Here's one way to set it up.这是设置它的一种方法。 If you always know you need to lookup by id , you can use a dict mapping of id to part instead, since a dict lookup is much faster than finding an part from a list.如果您始终知道需要按id进行查找,则可以改用 id 到零件的dict映射,因为dict查找比从列表中查找零件要快得多。 I also cache the list of dataclass fields that are related to stocks, just as that might be a good idea also.我还缓存了与股票相关的数据类字段列表,因为这也是一个好主意。

from dataclasses import dataclass, fields, field
from functools import cached_property
from typing import List, Dict, Union, Tuple


@dataclass
class PartData:
    id: int = 0
    name: str = None
    value: int = 0


@dataclass
class StockData:
    stock_1: Dict[int, PartData] = field(default_factory=dict)
    stock_2: Dict[int, PartData] = field(default_factory=dict)

    @cached_property
    def stock_fields(self) -> Tuple[str, ...]:
        return tuple(f.name for f in fields(self)
                     if f.name.startswith("stock"))

    @classmethod
    def from_parts(cls, parts: List[Dict[str, Union[str, int]]]):
        """Create a new `StockData` object from list of parts."""
        stock = cls()

        for p in parts:
            part = PartData(**p)
            if part.id % 2 == 0:
                stock_list = 'stock_1'
            else:
                stock_list = 'stock_2'

            getattr(stock, stock_list)[part.id] = part

        return stock

    def update_part(self, id, value):
        """Update value for a part, given the part id."""

        for stock_field in self.stock_fields:
            stock = getattr(self, stock_field)
            if id in stock:
                stock[id].value = value
                return None

Usage is pretty similar to how you had it.用法与您的使用方式非常相似。 I also added a from_parts helper method, as it seems it might be a common pattern to construct a StockData instance from a list of parts.我还添加了一个from_parts辅助方法,因为它似乎是从零件列表构造StockData实例的常见模式。 Note that since the stock fields are now dictionaries, you can access the .values() to iterate over the PartData items in each stock.请注意,由于股票字段现在是字典,您可以访问.values()以迭代每个股票中的PartData项目。

def main():
    PARTS = [{"id": 1, "name": "screw"}, {"id": 3, "name": "bolt"},
             {"id": 42, "name": "glue"}, {"id": 11, "name": "nail"},
             {"id": 31, "name": "hammer"}, {"id": 142, "name": "paper"}]

    dc_stock = StockData.from_parts(PARTS)
    assert dc_stock.stock_2[1].value == 0

    print(dc_stock)

    dc_stock.update_part(1, 10)
    assert dc_stock.stock_2[1].value == 10

    print(dc_stock)

    print('Stock 1:')
    print(list(dc_stock.stock_1.values()))


if __name__ == '__main__':
    main()

What you're asking for is called indexing.你要求的是索引。

Basically, you have a dict accompanying your data structure of {<field value>: <items with this value>} which is updated appropriately whenever you update the data.基本上,您的数据结构{<field value>: <items with this value>}有一个dict它会在您更新数据时适当更新。
It's even easier if the field is unique (as an item ID should be): you need to only link to 1 item from a key rather than to a list of items.如果该字段是唯一的(因为项目 ID 应该是),那就更容易了:您只需要从一个键链接到 1 个项目,而不是链接到项目列表。

As you can see, keeping an index up to date is extra work, so it'd only benefit you past a certain data size;如您所见,保持索引最新是一项额外的工作,因此它只会让您超过一定的数据量; it also matters how often the data is written vs read (an index costs time on an update but saves time on a select past a certain data size once index lookup overhead becomes faster than iterating over the entire table) and which percentage of the queries will benefit from the index.数据写入和读取的频率也很重要(一旦索引查找开销变得比遍历整个表更快,索引会在更新时花费时间,但在 select 上节省时间超过特定数据大小)以及查询的百分比受益于指数。


First of all, consider not reinventing the wheel and using a Pythonic ORM like SQLAlchemy instead of dataclasses that does support indexing transparently.首先,考虑不要重新发明轮子并使用像SQLAlchemy这样的 Pythonic ORM 而不是支持透明索引的数据类。 You don't need to run a DB server to benefit from it as it can use a serverless DB like SQLite as a backend, too.您无需运行数据库服务器即可从中受益,因为它也可以使用像 SQLite 这样的无服务器数据库作为后端。 Moreover, a compiled backend will likely be much (orders of magnitude) faster than a pure Python one.此外,编译后端可能比纯 Python 快得多(数量级)。


The way to integrate a dict -base index into your data structure would be to keep it in the table class ( StockData ) and command the table instance to update the index whenever any of the indexed fields is written (including when they are first initialized).将基于dict的索引集成到数据结构中的方法是将其保存在表 class ( StockData ) 中,并命令表实例在写入任何索引字段时更新索引(包括首次初始化时) .

  • Probably the easiest way to do that is:可能最简单的方法是:
    • Keep a reference to the table instance in each record instances (it's sufficient to keep a reference to just the update method)在每个记录实例中保留对表实例的引用(仅保留对更新方法的引用就足够了)
    • Use it to command the table instance to update indices whenever an indexed field is written (incl. when it's first initialized)使用它来命令表实例在写入索引字段时更新索引(包括首次初始化时)
  • If you don't want to modify field classes, your options are:如果您不想修改字段类,您的选择是:
    • Do not write fields directly but only through some interface provided by the table class. This way, the table class' logic will have an opportunity to update the index because it will get control after writing the value but before returning to you不要直接写入字段,而只能通过表 class 提供的一些接口。这样,表类的逻辑将有机会更新索引,因为它会在写入值之后返回给你之前获得控制权
    • Update the index manually after any writes.在任何写入后手动更新索引。 This is error-prone (=a recipe for eventual disaster), especially in more complex operations with many interdependent steps 'cuz you may forget, or even not be allowed by syntax (eg if you use generator expressions) to call the update at an appropriate moment.这是容易出错的(=最终灾难的秘诀),尤其是在具有许多相互依赖的步骤的更复杂的操作中,因为您可能会忘记,甚至语法不允许(例如,如果您使用生成器表达式)调用更新适当的时刻。

Here's an illustration of what the "easiest way" option can look like:这是“最简单的方法”选项的示例:

class PartData:
  <...>
  _table: StockData

  def __setitem__(self, key, new_value):
    if key == 'id':
      self._table.update_id_index(self, new_value, self.__getitem__(key))
    super(self,PartData).__setitem__(self, key, new_value)

class StockData:
  <...>
  # assuming id is unique
  id_index: {object: PartData} = {}

  def update_id_index(self, record, new_value, old_value = None):
    try: del self.id_index[old_value]
    except KeyError: pass
    self.id_index[new_value] = record

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM