简体   繁体   English

如何在 SQLAlchemy 子查询中指定 FROM 表?

[英]How to specify the FROM tables in SQLAlchemy subqueries?

I am trying to fetch in a single query a fixed set of rows, plus some other rows found by a subquery.我试图在单个查询中获取一组固定的行,以及子查询找到的一些其他行。 My problem is that the query generated by my SQLAlchemy code is incorrect.我的问题是我的 SQLAlchemy 代码生成的查询不正确。

The problem is that the query generated by SQLAlchemy is as follows:问题是SQLAlchemy生成的查询如下:

SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN
(
SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id =
(
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC LIMIT 1 OFFSET 0
)
AND t1.id IN (4, 8)
)
OR tbl.id IN (0, 8)

while the correct query should not have the second tbl AS t1 (the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8).而正确的查询不应该有第二个tbl AS t1 (此查询的目标是选择 ID 0 和 8,以及 4 和 8 之前的 ID)。

Unfortunately, I can't find how to get SQLAlchemy to generate the correct one (see the code below).不幸的是,我找不到如何让 SQLAlchemy 生成正确的(见下面的代码)。

Suggestions to also achieve the same result with a simpler query are also welcome (they need to be efficient though -- I tried a few variants and some were a lot slower on my real use case).也欢迎使用更简单的查询获得相同结果的建议(尽管它们需要高效 - 我尝试了一些变体,有些变体在我的实际用例中要慢得多)。

The code producing the query:生成查询的代码:

from sqlalchemy import create_engine, or_
from sqlalchemy import Column, Integer, MetaData, Table
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///:memory:', echo=True)
meta = MetaData(bind=engine)
table = Table('tbl', meta, Column('id', Integer))
session = sessionmaker(bind=engine)()
meta.create_all()

# Insert IDs 0, 2, 4, 6, 8.
i = table.insert()
i.execute(*[dict(id=i) for i in range(0, 10, 2)])
print session.query(table).all()
# output: [(0,), (2,), (4,), (6,), (8,)]

# Subquery of interest: look for the row just before IDs 4 and 8.
sub_query_txt = (
        'SELECT t2.id '
        'FROM tbl t1, tbl t2 '
        'WHERE t2.id = ( '
        ' SELECT t3.id from tbl t3 '
        ' WHERE t3.id < t1.id '
        ' ORDER BY t3.id DESC '
        ' LIMIT 1) '
        'AND t1.id IN (4, 8)')
print session.execute(sub_query_txt).fetchall()
# output: [(2,), (6,)]

# Full query of interest: get the rows mentioned above, as well as more rows.
query_txt = (
        'SELECT * '
        'FROM tbl '
        'WHERE ( '
        ' id IN (%s) '
        'OR id IN (0, 8))'
        ) % sub_query_txt
print session.execute(query_txt).fetchall()
# output: [(0,), (2,), (6,), (8,)]

# Attempt at an SQLAlchemy translation (from innermost sub-query to full query).
t1 = table.alias('t1')
t2 = table.alias('t2')
t3 = table.alias('t3')
q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
             limit(1)
q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
q3 = session.query(table).filter(
                               or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
print list(q3)
# output: [(0,), (6,), (8,)]

What you are missing is a correlation between the innermost sub-query and the next level up;您缺少的是最内层子查询和下一级之间的相关性; without the correlation, SQLAlchemy will include the t1 alias in the innermost sub-query:如果没有相关性,SQLAlchemy 将在最里面的子查询中包含t1别名:

>>> print str(q1)
SELECT t3.id AS t3_id 
FROM tbl AS t3, tbl AS t1 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?
>>> print str(q1.correlate(t1))
SELECT t3.id AS t3_id 
FROM tbl AS t3 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?

Note that tbl AS t1 is now missing from the query.请注意,现在查询中缺少tbl AS t1 From the .correlate() method documentation :.correlate()方法文档

Return a Query construct which will correlate the given FROM clauses to that of an enclosing Query or select().返回一个 Query 结构,它将给定的 FROM 子句与封闭的 Query 或 select() 的子句相关联。

Thus, t1 is assumed to be part of the enclosing query, and isn't listed in the query itself.因此,假定t1是封闭查询的一部分,并且未在查询本身中列出。

Now your query works:现在您的查询有效:

>>> q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
...              limit(1).correlate(t1)
>>> q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
>>> q3 = session.query(table).filter(
...                                or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
>>> print list(q3)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine SELECT tbl.id AS tbl_id 
FROM tbl 
WHERE tbl.id IN (SELECT t2.id AS t2_id 
FROM tbl AS t2, tbl AS t1 
WHERE t2.id = (SELECT t3.id AS t3_id 
FROM tbl AS t3 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?) AND t1.id IN (?, ?)) OR tbl.id IN (?, ?)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine (1, 0, 4, 8, 0, 8)
[(0,), (2,), (6,), (8,)]

I'm only kinda sure I understand the query you're asking for.我只是有点确定我理解你要问的问题。 Lets break it down, though:不过,让我们分解一下:

the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8.此查询的目标是选择 ID 0 和 8,以及 4 和 8 之前的 ID。

It looks like you want to query for two kinds of things, and then combine them.看起来你要查询两种东西,然后将它们组合起来。 The proper operator for that is union .正确的运算符是union Do the simple queries and add them up at the end.做简单的查询,最后把它们加起来。 I'll start with the second bit, "ids just before X".我将从第二位开始,“X 之前的 id”。

To start with;首先; lets look at the all the ids that are before some given value.让我们看看在某个给定值之前的所有 id。 For this, we'll join the table on itself with a < :为此,我们将使用<加入表格本身:

# select t1.id t1_id, t2.id t2_id from tbl t1 join tbl t2 on t1.id < t2.id;
 t1_id | t2_id 
-------+-------
     0 |     2
     0 |     4
     0 |     6
     0 |     8
     2 |     4
     2 |     6
     2 |     8
     4 |     6
     4 |     8
     6 |     8
(10 rows)

That certainly gives us all of the pairs of rows where the left is less than the right.这当然为我们提供了左侧小于右侧的所有行对。 Of all of them, we want the rows for a given t2_id that is as high as possible;在所有这些中,我们希望给定 t2_id 的行尽可能高; We'll group by t2_id and select the maximum t1_id我们将按 t2_id 分组并选择最大的 t1_id

# select max(t1.id), t2.id from tbl t1 join tbl t2 on t1.id < t2.id group by t2.id;
 max | id 
-----+-------
   0 |     2
   2 |     4
   4 |     6
   6 |     8
(4 rows)

Your query, using a limit , could achieve this, but its usually a good idea to avoid using this technique when alternatives exist because partitioning does not have good, portable support across Database implementations.您的查询使用limit可以实现这一点,但是当存在替代方案时避免使用此技术通常是一个好主意,因为分区没有跨数据库实现的良好的、可移植的支持。 Sqlite can use this technique, but postgresql doesn't like it, it uses a technique called "analytic queries" (which are both standardised and more general). Sqlite 可以使用这种技术,但 postgresql 不喜欢它,它使用一种称为“分析查询”的技术(既标准化又更通用)。 MySQL can do neither. MySQL 两者都做不到。 The above query, though, works consistently across all sql database engines.但是,上面的查询在所有 sql 数据库引擎中都可以一致地工作。

the rest of the work is just using in or other equivalent filtering queries and are not difficult to express in sqlalchemy.剩下的工作只是使用in或其他等效的过滤查询,在 sqlalchemy 中不难表达。 The boilerplate...样板...

>>> import sqlalchemy as sa
>>> from sqlalchemy.orm import Query
>>> engine = sa.create_engine('sqlite:///:memory:')
>>> meta = sa.MetaData(bind=engine)
>>> table = sa.Table('tbl', meta, sa.Column('id', sa.Integer))
>>> meta.create_all()

>>> table.insert().execute([{'id':i} for i in range(0, 10, 2)])

>>> t1 = table.alias()
>>> t2 = table.alias()

>>> before_filter = [4, 8]

First interesting bit is we give the 'max(id)' expression a name.第一个有趣的地方是我们给 'max(id)' 表达式一个名字。 this is needed so that we can refer to it more than once, and to lift it out of a subquery.这是必需的,以便我们可以多次引用它,并将其从子查询中取出。

>>> c1 = sa.func.max(t1.c.id).label('max_id')
>>> #                                ^^^^^^

The 'heavy lifting' portion of the query, join the above aliases, group and select the max查询的“繁重”部分,加入上述别名,分组并选择最大值

>>> q1 = Query([c1, t2.c.id]) \
...      .join((t2, t1.c.id < t2.c.id)) \
...      .group_by(t2.c.id) \
...      .filter(t2.c.id.in_(before_filter))

Because we'll be using a union, we need this to produce the right number of fields: we wrap it in a subquery and project down to the only column we're interested in. This will have the name we gave it in the above label() call.因为我们将使用联合,我们需要它来产生正确数量的字段:我们将它包装在一个子查询中并投影到我们唯一感兴趣的列。这将具有我们在上面给它的名称label()调用。

>>> q2 = Query(q1.subquery().c.max_id)
>>> #                          ^^^^^^

The other half of the union is much simpler:工会的另一半要简单得多:

>>> t3 = table.alias()
>>> exact_filter = [0, 8]
>>> q3 = Query(t3).filter(t3.c.id.in_(exact_filter))

All that's left is to combine them:剩下的就是将它们组合起来:

>>> q4 = q2.union(q3)
>>> engine.execute(q4.statement).fetchall()
[(0,), (2,), (6,), (8,)]

The responses here helped me fix my issue but in my case I had to use both correlate() and subquery() :这里的回复帮助我解决了我的问题,但就我而言,我不得不同时使用correlate()subquery()

# ...
subquery = subquery.correlate(OuterCorrelationTable).subquery()
filter_query = db.session.query(func.sum(subquery.c.some_count_column))
filter = filter_query.as_scalar() == as_many_as_some_param
# ...
final_query = db.session.query(OuterCorrelationTable).filter(filter)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM