简体   繁体   中英

A solution to SQLAlchemy temporary table pain?

It seems like the biggest drawback with SQLAlchemy is that it takes several steps backwards when it comes to working with temporary tables. A very common use case, for example, is to create a temporary table that is very specific to one task, throw some data in it, then join against it.

For starters, declaring a temporary table is verbose, and limited. Note that in this example I had to edit it because my classes actually inherit a base class, so what I give here may be slightly incorrect.

@as_declarative(metaclass=MetaBase)
class MyTempTable(object):

    __tablename__ = "temp"
    __table_args__ = {'prefixes': ['TEMPORARY']}

    id = Column(Integer(), primary_key=True)
    person_id = Column(BigInteger())
    a_string = Column(String(100))

Creating it is unintuitive:

MyTempTable.__table__.create(session.bind)

I also have to remember to explictly drop it unless I do something creative to get it to render with ON COMMIT DROP:

MyTempTable.__table__.drop(session.bind)

Also, what I just gave doesn't even work unless the temporary table is done "top level". I still haven't fully figured this out (for lack of wanting to spend time investigating why it doesn't work), but basically I tried creating a temp table in this manner inside of a nested transaction using session.begin_nested() and you end up with an error saying the relation does not exist. However, I have several cases where I create a temporary table inside of a nested transaction for unit testing purposes and they work just fine. Checking the echo output, it appears the difference is that one renders before the BEGIN statement, while the other renders after it. This is using Postgresql.

What does work inside of a nested transaction, and quite frankly saves you a bunch of time, is to just type out the damned sql and execute it using session.execute.

        session.execute(text(
            "CREATE TEMPORARY TABLE temp ("
            "  id SERIAL,"
            "  person_id BIGINT,"
            "  a_string TEXT"
            ") ON COMMIT DROP;"
        ))

Of course, if you do this, you still need a corresponding table model to make use of ORM functionality, or have to stick to using raw sql queries, which defeats the purpose of SQLAlchemy in the first place.

I'm wondering if maybe I'm missing something here or if someone has come up with a solution that is a bit more elegant.

I use ORM with Core. ORM is reserved for higher-level operations. For large volumes of data and for temp tables Core is more handy. Example:

temptbl_name = 'temp_del_dup_pk_{}'.format(datestamp)
temptbl = Table(temptbl_name, metadata, Column('col1', Integer, index=True),..., extend_existing=True)
temptbl.create(engine)

Update Here is a simple function that can generate temp table ORM definition on the fly:

def temp_table(name, cols):
    args = dict(col1=Column(Integer, index=True),...)
    args['__tablename__'] = name
    args['__table_args__'] = dict(extend_existing=True)
    return type(name, (Base,), args)

It can be useful to mirror columns of an existing table:

def temp_table(name, base_table):
    args = {c.name:c.copy() for c in base_table.__table__.c}
    ...

I decided to build on this answer, as I wanted a bit more of a flexible way to create a copy table from an existing model while still supporting index definitions and playing nice with alembic *.

I find this approach useful both for creating true temporary tables and for creating on-the-fly tables that will be swapped with the main table. The latter is where you can run into more tricky alembic scenarios if the definitions don't match perfectly.

* With my particular usage patterns

import time
import warnings

import sqlalchemy as sa


def copy_table_args(model, **kwargs):
    table_args = model.__table_args__

    if isinstance(table_args, tuple):
        new_args = []
        for arg in table_args:
            if isinstance(arg, dict):
                table_args_dict = arg.copy()
                table_args_dict.update(**kwargs)
                new_args.append(table_args_dict)
            elif isinstance(arg, sa.Index):
                index = sa.Index(
                    arg.name,
                    *[col for col in arg.columns.keys()],
                    unique=arg.unique,
                    **arg.kwargs,
                )
                new_args.append(index)
            elif isinstance(arg, sa.UniqueConstraint):
                new_args.append(arg.copy())
            else:
                # TODO: need to handle other Constraints
                raise Exception(f"Unhandled table arg: {arg}")
        table_args = tuple(new_args)
    elif isinstance(table_args, dict):
        table_args = {
            k: (v.copy() if hasattr(v, "copy") else v) for k, v in table_args.items()
        }
        table_args.update(**kwargs)
    else:
        raise Exception(f"Unexpected __table_args__ type: {table_args}")

    return table_args


def copy_table_from_model(conn, model, **kwargs):
    model_name = model.__name__ + "Tmp"
    table_name = model.__table__.name + "_" + str(time.time()).replace(".", "_")
    table_args = copy_table_args(model, extend_existing=True, **kwargs)

    args = {c.name: c.copy() for c in model.__table__.c}
    args["__tablename__"] = table_name
    args["__table_args__"] = table_args

    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=sa.exc.SAWarning)
        copy_model = type(model_name, model.__bases__, args)
        copy_model.__table__.create(conn)
    return copy_model


def temp_table_from_model(conn, model):
    return copy_table_from_model(conn, model, prefixes=["TEMPORARY"])

Note: I haven't added logic to handle copying Constraints, and this is lightly tested against MySQL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM