簡體   English   中英

如何使用 SqlAlchemy 進行更新插入?

[英]How to do an upsert with SqlAlchemy?

我有一條記錄,如果它不存在,我想存在於數據庫中,如果它已經存在(主鍵存在),我希望將字段更新為當前的 state。 這通常稱為upsert

以下不完整的代碼片段演示了什么會起作用,但它似乎過於笨拙(特別是如果有更多的列)。 什么是更好/最好的方法?

Base = declarative_base()
class Template(Base):
    __tablename__ = 'templates'
    id = Column(Integer, primary_key = True)
    name = Column(String(80), unique = True, index = True)
    template = Column(String(80), unique = True)
    description = Column(String(200))
    def __init__(self, Name, Template, Desc):
        self.name = Name
        self.template = Template
        self.description = Desc

def UpsertDefaultTemplate():
    sess = Session()
    desired_default = Template("default", "AABBCC", "This is the default template")
    try:
        q = sess.query(Template).filter_by(name = desiredDefault.name)
        existing_default = q.one()
    except sqlalchemy.orm.exc.NoResultFound:
        #default does not exist yet, so add it...
        sess.add(desired_default)
    else:
        #default already exists.  Make sure the values are what we want...
        assert isinstance(existing_default, Template)
        existing_default.name = desired_default.name
        existing_default.template = desired_default.template
        existing_default.description = desired_default.description
    sess.flush()

有沒有更好或更簡潔的方法來做到這一點? 這樣的事情會很棒:

sess.upsert_this(desired_default, unique_key = "name")

雖然unique_key kwarg 顯然是不必要的(ORM 應該能夠輕松解決這個問題)我添加它只是因為 SQLAlchemy 往往只與主鍵一起使用。 例如:我一直在研究Session.merge是否適用,但這僅適用於主鍵,在這種情況下,主鍵是一個自動遞增的 id,對此目的並不是非常有用。

一個示例用例是在啟動可能已升級其默認預期數據的服務器應用程序時。 即:這個upsert沒有並發問題。

SQLAlchemy確實有一個“保存或更新”行為,在最近的版本中已經內置到session.add ,但之前是單獨的session.saveorupdate調用。 這不是一個“upsert”,但它可能足以滿足您的需求。

你問一個有多個唯一鍵的課是件好事。 我相信這正是沒有一種正確方法可以做到這一點的原因。 主鍵也是唯一鍵。 如果沒有唯一約束,只有主鍵,那就足夠簡單:如果不存在給定ID,或者如果ID為None,則創建一個新記錄; 否則使用該主鍵更新現有記錄中的所有其他字段。

但是,當存在其他獨特約束時,這種簡單方法存在邏輯問題。 如果要“upsert”一個對象,並且對象的主鍵與現有記錄匹配,但另一個唯一列與另一個記錄匹配,那么您要做什么? 同樣,如果主鍵不匹配現有記錄,但另一個唯一列確實匹配現有記錄,那么什么? 對於您的特定情況,可能有正確的答案,但總的來說,我認為沒有一個正確的答案。

這就是沒有內置“upsert”操作的原因。 應用程序必須定義每種特定情況下的含義。

SQLAlchemy支持ON CONFLICT有兩個方法on_conflict_do_update()on_conflict_do_nothing()

文檔中復制:

from sqlalchemy.dialects.postgresql import insert

stmt = insert(my_table).values(user_email='a@b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
    index_elements=[my_table.c.user_email],
    index_where=my_table.c.user_email.like('%@gmail.com'),
    set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)

我使用“先看你跳躍”的方法:

# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
    filter(Switch_Command.switch_id == switch.id).\
    filter(Switch_Command.command_id == command.id).first()

# If we didn't get anything, make one
if not switch_command:
    switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)

# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()

session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()

優點是這是數據庫中立的,我認為很清楚。 缺點是在如下情況下存在潛在的競爭條件

  • 我們在db中查詢switch_command但沒有找到
  • 我們創建一個switch_command
  • 另一個進程或線程使用與我們相同的主鍵創建switch_command
  • 我們嘗試提交switch_command

如今,SQLAlchemy提供了兩個有用的函數on_conflict_do_nothingon_conflict_do_update 這些函數很有用,但需要您從ORM接口切換到較低級別的SQLAlchemy Core

盡管這兩個函數使用SQLAlchemy的語法進行插入並不困難,但這些函數遠不能提供完整的開箱即用的插件解決方案。

我的常見用例是在單個SQL查詢/會話執行中插入大量行。 我經常遇到兩個問題:upserting:

例如,我們已經習慣了更高級別的ORM功能。 您不能使用ORM對象,而是必須在插入時提供ForeignKey

我用下面的函數我寫來處理這兩個問題:

def upsert(session, model, rows):
    table = model.__table__
    stmt = postgresql.insert(table)
    primary_keys = [key.name for key in inspect(table).primary_key]
    update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}

    if not update_dict:
        raise ValueError("insert_or_update resulted in an empty update_dict")

    stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
                                      set_=update_dict)

    seen = set()
    foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
    unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
    def handle_foreignkeys_constraints(row):
        for c_name, c_value in foreign_keys.items():
            foreign_obj = row.pop(c_value.table.name, None)
            row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None

        for const in unique_constraints:
            unique = tuple([const,] + [row[col.name] for col in const.columns])
            if unique in seen:
                return None
            seen.add(unique)

        return row

    rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
    session.execute(stmt, rows)

有多個答案,這里還有另一個答案(YAA)。 由於涉及元編程,其他答案不那么可讀。 這是一個例子

  • 使用 SQLAlchemy ORM

  • 顯示如果有零行使用on_conflict_do_nothing創建行

  • 展示如何在不使用on_conflict_do_update創建新行的情況下更新現有行(如果有)

  • 使用表主鍵作為constraint

原始問題中的更長示例此代碼與什么相關


import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session

class PairState(Base):

    __tablename__ = "pair_state"

    # This table has 1-to-1 relationship with Pair
    pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
    pair = orm.relationship(Pair,
                        backref=orm.backref("pair_state",
                                        lazy="dynamic",
                                        cascade="all, delete-orphan",
                                        single_parent=True, ), )


    # First raw event in data stream
    first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    # Last raw event in data stream
    last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    # The last hypertable entry added
    last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    @staticmethod
    def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Sets the first event value if not exist yet."""
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, first_event_at=ts).
            on_conflict_do_nothing()
        )

    @staticmethod
    def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Replaces the the column last_event_at for a named pair."""
        # Based on the original example of https://stackoverflow.com/a/49917004/315168
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, last_event_at=ts).
            on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
        )

    @staticmethod
    def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Replaces the the column last_interval_at for a named pair."""
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, last_interval_at=ts).
            on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
        )

這適用於sqlite3和postgres。 雖然它可能會因組合主鍵約束而失敗,並且很可能會因其他唯一約束而失敗。

    try:
        t = self._meta.tables[data['table']]
    except KeyError:
        self._log.error('table "%s" unknown', data['table'])
        return

    try:
        q = insert(t, values=data['values'])
        self._log.debug(q)
        self._db.execute(q)
    except IntegrityError:
        self._log.warning('integrity error')
        where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
        update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
        q = update(t, values=update_dict).where(*where_clause)
        self._log.debug(q)
        self._db.execute(q)
    except Exception as e:
        self._log.error('%s: %s', t.name, e)

使用redshift數據庫時,下面的工作正常,也適用於組合主鍵約束。

消息來源這個

在函數def start_engine()中創建SQLAlchemy引擎只需要很少的修改

from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql

Base = declarative_base()

def start_engine():
    engine = create_engine(os.getenv('SQLALCHEMY_URI', 
    'postgresql://localhost:5432/upsert'))
     connect = engine.connect()
    meta = MetaData(bind=engine)
    meta.reflect(bind=engine)
    return engine


class DigitalSpend(Base):
    __tablename__ = 'digital_spend'
    report_date = Column(Date, nullable=False)
    day = Column(Date, nullable=False, primary_key=True)
    impressions = Column(Integer)
    conversions = Column(Integer)

    def __repr__(self):
        return str([getattr(self, c.name, None) for c in self.__table__.c])


def compile_query(query):
    compiler = query.compile if not hasattr(query, 'statement') else 
  query.statement.compile
    return compiler(dialect=postgresql.dialect())


def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
    table = model.__table__

    stmt = insert(table).values(rows)

    update_cols = [c.name for c in table.c
                   if c not in list(table.primary_key.columns)
                   and c.name not in no_update_cols]

    on_conflict_stmt = stmt.on_conflict_do_update(
        index_elements=table.primary_key.columns,
        set_={k: getattr(stmt.excluded, k) for k in update_cols},
        index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
        )

    print(compile_query(on_conflict_stmt))
    session.execute(on_conflict_stmt)


session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])

這允許基於字符串名稱訪問底層模型

def get_class_by_tablename(tablename):
  """Return class reference mapped to table.
  https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
  :param tablename: String with name of table.
  :return: Class reference or None.
  """
  for c in Base._decl_class_registry.values():
    if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
      return c


sqla_tbl = get_class_by_tablename(table_name)

def handle_upsert(record_dict, table):
    """
    handles updates when there are primary key conflicts

    """
    try:
        self.active_session().add(table(**record_dict))
    except:
        # Here we'll assume the error is caused by an integrity error
        # We do this because the error classes are passed from the
        # underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
        # them with it's own code - this should be updated to have
        # explicit error handling for each new db engine

        # <update>add explicit error handling for each db engine</update> 
        active_session.rollback()
        # Query for conflic class, use update method to change values based on dict
        c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
        c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk

        c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
        c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols

        c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()

        # apply new data values to the existing record
        for k, v in record_dict.items()
            setattr(c_target_record, k, v)

在 sqlite 的情況下,可以在定義UniqueConstraint時使用sqlite_on_conflict='REPLACE'選項,並在單個列上使用sqlite_on_conflict_unique唯一約束。 然后session.add將以類似於upsert的方式工作。 見官方文檔

由於我們在生成默認 ID 和引用時遇到問題,導致 ForeignKeyViolation-Errors 像

update or delete on table "..." violates foreign key constraint
Key (id)=(...) is still referenced from table "...".

我們必須排除更新字典的 id,否則它將始終作為新的默認值生成。

此外,該方法返回創建/更新的實體。

from sqlalchemy.dialects.postgresql import insert # Important to use the postgresql insert


def upsert(session, data, key_columns, model):

    stmt = insert(model).values(data)
    
    # Important to exclude the ID for update!
    exclude_for_update = [model.id.name, *key_columns]
    update_dict = {c.name: c for c in stmt.excluded if c.name not in exclude_for_update}

    stmt = stmt.on_conflict_do_update(
        index_elements=key_columns,
        set_=update_dict
    ).returning(model)

    orm_stmt = (
        select(model)
        .from_statement(stmt)
        .execution_options(populate_existing=True)
    )

    return session.execute(orm_stmt).scalar()

例子:


class UpsertUser(Base):
    __tablename__ = 'upsert_user'
    id = Column(Id, primary_key=True, default=uuid.uuid4)
    name: str = Column(sa.String, nullable=False)
    user_sid: str = Column(sa.String, nullable=False, unique=True)
    house_admin = relationship('UpsertHouse', back_populates='admin', uselist=False)


class UpsertHouse(Base):
    __tablename__ = 'upsert_house'
    id = Column(Id, primary_key=True, default=uuid.uuid4)
    admin_id: Id = Column(Id, ForeignKey('upsert_user.id'), nullable=False)
    admin: UpsertUser = relationship('UpsertUser', back_populates='house_admin', uselist=False)

# Usage

upserted_user = upsert(session, updated_user, [UpsertUser.user_sid.name], UpsertUser)

注意:僅在 postgresql 上測試,但也適用於支持重復密鑰更新的其他數據庫,例如 MySQL

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM