SQLAlchemy 实现类似 ARRAY 的列（无 postgresql）

Question

我知道之前有人问过类似的问题，但我找不到符合我目的的答案。 如果有，我深表歉意，请务必将我的问题标记为重复！

我正在处理包含数值的数据库，使用 SQLAlchemy。其中一些是笛卡尔 3D 坐标，形式为 (x, y, z)。 例如，假设我有一个“资产”class，它表示具有 position 和纹理的可渲染对象（后者是文件的字符串，只是为了示例）。 这就是我现在要写的：

import sqlalchemy as sql

class Asset(Base):
    """A renderable object in 3D."""
    __tablename__ = "assets"

    # A unique id for this asset.
    id = sql.Column(sql.Integer, primary_key=True)

    # File with the 3D object.
    mesh = sql.Column(sql.String(80), nullable=False)
    
    # 3D position of the object.
    position_x = sql.Column(sql.Float, nullable=False)
    position_y = sql.Column(sql.Float, nullable=False)
    position_z = sql.Column(sql.Float, nullable=False)

    @property
    def position(self):
        return self.position_x, self.position_y, self.position_z

position属性只是为了便于稍后在我的代码中使用 - 我将坐标用作单个 3D 向量，这至少消除了每次手动创建此类结构的需要。

当我尝试向数据库添加新对象时，我希望尽可能避免重复。 这意味着每当我尝试将 object 添加到数据库时，我都会检查是否已经存在具有相同属性的条目。 如果是这样，我不添加 object。这将转换为以下代码：

def add_asset(mesh: str, position: Tuple[float, float, float]) -> bool:
    engine = sql.create_engine("DB_URL")
    with sql.orm.Session(engine) as session:
        # Check if an asset with the same properties already exists.
        x, y, z = position
        asset = session.query(Asset).filter(
            sql.func.abs(x - Asset.position_x) < 1e-6,
            sql.func.abs(y - Asset.position_x) < 1e-6,
            sql.func.abs(z - Asset.position_x) < 1e-6
        ).filter_by(mesh=mesh).first()

        # Match found? Exit and return "not added"!
        if asset is not None:
            return False

        # Match not found: add the asset and return "added".
        session.add(Asset(mesh=mesh, position_x=x, position_y=y, position_z=z))
        session.commit()
        return True

sql.func.abs(x - Asset.position_x) < 1e-6用于处理不应使用相等性过滤Float值的事实。 这里的“公差”相当大，但这实际上是故意的，因为我不需要亚微米精度！

请注意，用于处理position_x 、 position_y和position_z的代码本质上是相同的。 此外，我实际上需要在许多其他地方存储 N 维向量，N 是预先知道的（不仅仅是在运行时），但是 N 的不同值取决于上下文（我的意思是我有 2D 向量一些地方，3D，甚至在其他地方是 6D 向量，等等）。

我想做的是创建一种新类型的列（我们称之为Vector ），它可以避免编写 N 次相同的代码行。 理想情况下，我想将上面的代码转换为：

class Asset(Base):
    """A renderable object in 3D."""
    __tablename__ = "assets"

    # A unique id for this asset.
    id = sql.Column(sql.Integer, primary_key=True)

    # File with the 3D object.
    mesh = sql.Column(sql.String(80), nullable=False)
    
    # 3D position of the object.
    position = sql.Column(Vector(3), nullable=False)


def add_asset(mesh: str, position: Tuple[float, float, float]) -> bool:
    engine = sql.create_engine("DB_URL")
    with sql.orm.Session(engine) as session:
        # Check if an asset with the same properties already exists.
        asset = session.query(Asset).filter(
            close(position, Asset.position, 1e-6)
        ).filter_by(mesh=mesh).first()

        # Match found? Exit and return "not added"!
        if asset is not None:
            return False

        # Match not found: add the asset and return "added".
        session.add(Asset(mesh=mesh, position=position))
        session.commit()
        return True

我知道 PostgreSQL 中有一个ARRAY列类型，但我不能使用它 - 我的意思是，PostgreQLS。

到目前为止我得到的最接近的是：

import sqlalchemy as sql
from sqlalchemy.ext import hybrid

class Asset(Base):
    """A renderable object in 3D."""
    __tablename__ = "assets"

    # A unique id for this asset.
    id = sql.Column(sql.Integer, primary_key=True)

    # File with the 3D object.
    mesh = sql.Column(sql.String(80), nullable=False)
    
    # 3D position of the object.
    px, py, pz = (sql.Column(sql.Float, nullable=False) for _ in range(3))

    @hybrid_property
    def position(self):
        return self.px, self.py, self.pz


def close_to(properties, values, tolerance):
   return sql.all_(*(sql.func.abs(p - v) < tolerance for p, v in zip(properties, values)))


def add_asset(mesh: str, position: Tuple[float, float, float]) -> bool:
    engine = sql.create_engine("DB_URL")
    with sql.orm.Session(engine) as session:
        # Check if an asset with the same properties already exists.
        asset = session.query(Asset).filter(
            close_to(Asset.position, position, 1e-6)
        ).filter_by(mesh=mesh).first()

        # Match found? Exit and return "not added"!
        if asset is not None:
            return False

        # Match not found: add the asset and return "added".
        session.add(Asset(mesh=mesh, px=position[0], py=position[1], pz=position[2]))
        session.commit()
        return True

但我仍然不喜欢我需要的每个向量：

单独声明字段： px, py, pz = (sql.Column(...) for _ in range(3))
手动添加hybrid_property 。
在构造函数中一一初始化属性： px=position[0], py=position[1], pz=position[2] 。

感谢您的帮助，在此先感谢！

Answer 1

我设法想出了一个“狡猾”的解决方案。 我不确定这是否是通往 go 的路，但看起来它正在完成工作……所以就在这里！

思路如下：

创建一个名为VectorColumnPlaceHolder的虚拟列 class，它只存储一个 integer - 向量的维度。 它还有一个名为process的 static 方法，稍后我将详细介绍。
照常定义数据库 class。 对于矢量元素，使用类似position = VectorColumnPlaceHolder(3)的东西。
然后，我们在包含向量列的 class 上调用方法VectorColumnPlaceHolder.process() 。 这就是“真正的魔法”发生的地方。 这个方法的作用是：
- 扫描 class 的所有属性，查找VectorColumnPlaceHolder实例。
- 对于它们中的每一个，添加一组名为<column_name>_i的列，其中i是一个范围从0到N-1的索引。 例如，如果我们指定列position = VectorColumnPlaceHolder(3) ，这将通过添加列position_0 、 position_1和position_2来处理。
- 此外，添加了一个混合属性（带有 getter 和 setter），允许一次检索/修改所有坐标。 作为混合属性，它也可用于构建查询！ 这个新属性替换了原来的VectorColumnPlaceHolder实例。

我上面的例子（ Asset ）将变成：

class Asset(Base):
    """A renderable object in 3D."""
    __tablename__ = "assets"

    # A unique id for this asset.
    id = sql.Column(sql.Integer, primary_key=True)

    # File with the 3D object.
    mesh = sql.Column(sql.String(80), nullable=False)
    
    # 3D position of the object.
    position = VectorColumnPlaceHolder(3)

# Replace VectorColumnPlaceHolder with columns and properties.
# After this call:
# - The columns Asset.position_0,  Asset.position_1 and  Asset.position_2 are added.
# - Asset.position (initially a VectorColumnPlaceHolder) is replaced with a hybrid property.
VectorColumnPlaceHolder.process(Asset)


def close_to(properties, values, tolerance):
   return sql.all_(*(sql.func.abs(p - v) < tolerance
                     for p, v in zip(properties, values)))


def add_asset(mesh: str, position: Tuple[float, float, float]) -> bool:
    engine = sql.create_engine("DB_URL")
    with sql.orm.Session(engine) as session:
        # Check if an asset with the same properties already exists.
        asset = session.query(Asset).filter(
            close_to(Asset.position, position, 1e-6)
        ).filter_by(mesh=mesh).first()

        # Match found? Exit and return "not added"!
        if asset is not None:
            return False

        # Match not found: add the asset and return "added".
        session.add(Asset(mesh=mesh, position=position))
        session.commit()
        return True

这是 class VectorColumnPlaceHolder ：

class VectorColumnPlaceHolder(object):
    """A "dummy" SQL column, representing an N-dimensional vector."""

    def __init__(self, n: int, column_factory=None):
        """Create a vector placeholder.

        Args:
            n: Dimension of the vector.
            column_factory: Callable that can create a new column. Vector coordinates are created as individual columns,
                and each of them is created by calling this function. The parameter is optional, and by default columns
                are created as sql.Column(sql.Float, nullable=False).
        """
        if n < 1:
            raise ValueError(f"Number of coordinates must be a positive integer, not '{n}'.")
        self.n = n
        if column_factory is None:
            self.create_coordinate = lambda: sql.Column(sql.Float, nullable=False)
        else:
            self.create_coordinate = column_factory

    @staticmethod
    def process(Class):
        """Dynamically add columns in place of VectorColumnPlaceHolder objects inside the given class.

        Args:
            Class: Class for which the VectorColumnPlaceHolder should be replaced by as set of columns and hybrid
                properties.
        """
        # Inspect all attributes of the class and process those that represent "vector columns".
        for column_name in dir(Class):
            column = getattr(Class, column_name)
            if isinstance(column, VectorColumnPlaceHolder):
                # Add one column per vector coordinate.
                for i in range(column.n):
                    setattr(Class, f"{column_name}_{i}", column.create_coordinate())

                # Add a hybrid property representing the set of coordinates as a whole. It can be used to access the
                # values as Class.column_name - both to retrieve the values and to use the fields in queries.
                @hybrid_property
                def vector_hybrid(self, column_name=column_name, n=column.n):
                    return tuple(getattr(self, f"{column_name}_{i}") for i in range(n))

                # Add a setter to give a value to each coordinate at once.
                @vector_hybrid.setter
                def vector_hybrid(self, value, column_name=column_name, n=column.n):
                    for i in range(n):
                        setattr(self, f"{column_name}_{i}", value[i])

                # Add the hybrid property to the class - this replaces the VectorColumnPlaceHolder.
                setattr(Class, column_name, vector_hybrid)

SQLAlchemy 实现类似 ARRAY 的列（无 postgresql）

问题描述

1 个解决方案

解决方案1
0 2023-01-27 11:45:48

SQLAlchemy 实现类似 ARRAY 的列（无 postgresql）

问题描述

1 个解决方案

解决方案1 0 2023-01-27 11:45:48

解决方案1
0 2023-01-27 11:45:48