如何在 SQLAlchemy 子查詢中指定 FROM 表？

Question

我試圖在單個查詢中獲取一組固定的行，以及子查詢找到的一些其他行。 我的問題是我的 SQLAlchemy 代碼生成的查詢不正確。

問題是SQLAlchemy生成的查詢如下：

SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN
(
SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id =
(
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC LIMIT 1 OFFSET 0
)
AND t1.id IN (4, 8)
)
OR tbl.id IN (0, 8)

而正確的查詢不應該有第二個tbl AS t1 （此查詢的目標是選擇 ID 0 和 8，以及 4 和 8 之前的 ID）。

不幸的是，我找不到如何讓 SQLAlchemy 生成正確的（見下面的代碼）。

也歡迎使用更簡單的查詢獲得相同結果的建議（盡管它們需要高效 - 我嘗試了一些變體，有些變體在我的實際用例中要慢得多）。

生成查詢的代碼：

from sqlalchemy import create_engine, or_
from sqlalchemy import Column, Integer, MetaData, Table
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///:memory:', echo=True)
meta = MetaData(bind=engine)
table = Table('tbl', meta, Column('id', Integer))
session = sessionmaker(bind=engine)()
meta.create_all()

# Insert IDs 0, 2, 4, 6, 8.
i = table.insert()
i.execute(*[dict(id=i) for i in range(0, 10, 2)])
print session.query(table).all()
# output: [(0,), (2,), (4,), (6,), (8,)]

# Subquery of interest: look for the row just before IDs 4 and 8.
sub_query_txt = (
        'SELECT t2.id '
        'FROM tbl t1, tbl t2 '
        'WHERE t2.id = ( '
        ' SELECT t3.id from tbl t3 '
        ' WHERE t3.id < t1.id '
        ' ORDER BY t3.id DESC '
        ' LIMIT 1) '
        'AND t1.id IN (4, 8)')
print session.execute(sub_query_txt).fetchall()
# output: [(2,), (6,)]

# Full query of interest: get the rows mentioned above, as well as more rows.
query_txt = (
        'SELECT * '
        'FROM tbl '
        'WHERE ( '
        ' id IN (%s) '
        'OR id IN (0, 8))'
        ) % sub_query_txt
print session.execute(query_txt).fetchall()
# output: [(0,), (2,), (6,), (8,)]

# Attempt at an SQLAlchemy translation (from innermost sub-query to full query).
t1 = table.alias('t1')
t2 = table.alias('t2')
t3 = table.alias('t3')
q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
             limit(1)
q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
q3 = session.query(table).filter(
                               or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
print list(q3)
# output: [(0,), (6,), (8,)]

Answer 1

您缺少的是最內層子查詢和下一級之間的相關性； 如果沒有相關性，SQLAlchemy 將在最里面的子查詢中包含t1別名：

>>> print str(q1)
SELECT t3.id AS t3_id 
FROM tbl AS t3, tbl AS t1 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?
>>> print str(q1.correlate(t1))
SELECT t3.id AS t3_id 
FROM tbl AS t3 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?

請注意，現在查詢中缺少tbl AS t1 。 從.correlate()方法文檔：

返回一個 Query 結構，它將給定的 FROM 子句與封閉的 Query 或 select() 的子句相關聯。

因此，假定t1是封閉查詢的一部分，並且未在查詢本身中列出。

現在您的查詢有效：

>>> q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
...              limit(1).correlate(t1)
>>> q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
>>> q3 = session.query(table).filter(
...                                or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
>>> print list(q3)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine SELECT tbl.id AS tbl_id 
FROM tbl 
WHERE tbl.id IN (SELECT t2.id AS t2_id 
FROM tbl AS t2, tbl AS t1 
WHERE t2.id = (SELECT t3.id AS t3_id 
FROM tbl AS t3 
WHERE t3.id < t1.id ORDER BY t3.id DESC
 LIMIT ? OFFSET ?) AND t1.id IN (?, ?)) OR tbl.id IN (?, ?)
2012-10-24 22:16:22,239 INFO sqlalchemy.engine.base.Engine (1, 0, 4, 8, 0, 8)
[(0,), (2,), (6,), (8,)]

Answer 2

我只是有點確定我理解你要問的問題。 不過，讓我們分解一下：

此查詢的目標是選擇 ID 0 和 8，以及 4 和 8 之前的 ID。

看起來你要查詢兩種東西，然后將它們組合起來。 正確的運算符是union 。 做簡單的查詢，最后把它們加起來。 我將從第二位開始，“X 之前的 id”。

首先; 讓我們看看在某個給定值之前的所有 id。 為此，我們將使用<加入表格本身：

# select t1.id t1_id, t2.id t2_id from tbl t1 join tbl t2 on t1.id < t2.id;
 t1_id | t2_id 
-------+-------
     0 |     2
     0 |     4
     0 |     6
     0 |     8
     2 |     4
     2 |     6
     2 |     8
     4 |     6
     4 |     8
     6 |     8
(10 rows)

這當然為我們提供了左側小於右側的所有行對。 在所有這些中，我們希望給定 t2_id 的行盡可能高； 我們將按 t2_id 分組並選擇最大的 t1_id

# select max(t1.id), t2.id from tbl t1 join tbl t2 on t1.id < t2.id group by t2.id;
 max | id 
-----+-------
   0 |     2
   2 |     4
   4 |     6
   6 |     8
(4 rows)

您的查詢使用limit可以實現這一點，但是當存在替代方案時避免使用此技術通常是一個好主意，因為分區沒有跨數據庫實現的良好的、可移植的支持。 Sqlite 可以使用這種技術，但 postgresql 不喜歡它，它使用一種稱為“分析查詢”的技術（既標准化又更通用）。 MySQL 兩者都做不到。 但是，上面的查詢在所有 sql 數據庫引擎中都可以一致地工作。

剩下的工作只是使用in或其他等效的過濾查詢，在 sqlalchemy 中不難表達。 樣板...

>>> import sqlalchemy as sa
>>> from sqlalchemy.orm import Query
>>> engine = sa.create_engine('sqlite:///:memory:')
>>> meta = sa.MetaData(bind=engine)
>>> table = sa.Table('tbl', meta, sa.Column('id', sa.Integer))
>>> meta.create_all()

>>> table.insert().execute([{'id':i} for i in range(0, 10, 2)])

>>> t1 = table.alias()
>>> t2 = table.alias()

>>> before_filter = [4, 8]

第一個有趣的地方是我們給 'max(id)' 表達式一個名字。 這是必需的，以便我們可以多次引用它，並將其從子查詢中取出。

>>> c1 = sa.func.max(t1.c.id).label('max_id')
>>> #                                ^^^^^^

查詢的“繁重”部分，加入上述別名，分組並選擇最大值

>>> q1 = Query([c1, t2.c.id]) \
...      .join((t2, t1.c.id < t2.c.id)) \
...      .group_by(t2.c.id) \
...      .filter(t2.c.id.in_(before_filter))

因為我們將使用聯合，我們需要它來產生正確數量的字段：我們將它包裝在一個子查詢中並投影到我們唯一感興趣的列。這將具有我們在上面給它的名稱label()調用。

>>> q2 = Query(q1.subquery().c.max_id)
>>> #                          ^^^^^^

工會的另一半要簡單得多：

>>> t3 = table.alias()
>>> exact_filter = [0, 8]
>>> q3 = Query(t3).filter(t3.c.id.in_(exact_filter))

剩下的就是將它們組合起來：

>>> q4 = q2.union(q3)
>>> engine.execute(q4.statement).fetchall()
[(0,), (2,), (6,), (8,)]

Answer 3

這里的回復幫助我解決了我的問題，但就我而言，我不得不同時使用correlate()和subquery() ：

# ...
subquery = subquery.correlate(OuterCorrelationTable).subquery()
filter_query = db.session.query(func.sum(subquery.c.some_count_column))
filter = filter_query.as_scalar() == as_many_as_some_param
# ...
final_query = db.session.query(OuterCorrelationTable).filter(filter)

如何在 SQLAlchemy 子查詢中指定 FROM 表？

問題描述

3 個解決方案

解決方案1
2 已采納 2012-10-24 20:16:48

解決方案2
1 2012-10-24 20:45:19

解決方案3
0 2020-02-17 10:51:39

如何在 SQLAlchemy 子查詢中指定 FROM 表？

問題描述

3 個解決方案

解決方案1 2 已采納 2012-10-24 20:16:48

解決方案2 1 2012-10-24 20:45:19

解決方案3 0 2020-02-17 10:51:39

解決方案1
2 已采納 2012-10-24 20:16:48

解決方案2
1 2012-10-24 20:45:19

解決方案3
0 2020-02-17 10:51:39