[英]Convert lateral join query to sqlalchemy
我在将在sql(postgres)中创建的查询转换为sqlalchemy时遇到困难。 特别是,我尝试在sqlalchemy中进行映射会导致荒谬的递归结果,其运行速度将比我最初编写的结果慢得多。
给定以下类型的表结构:
metadata
------------------------------
primary_id - integer
secondary_count - integer
property - string (many to each primary_id)
data
-----------------------------
primary_id - integer
secondary_id - integer (many to each primary_id)
primary_json - json bytes
secondary_json - json bytes
我试图用这样的方式检索成对的主数据和辅助数据:
第一个很容易通过两个表之间的联接来完成,但是第二个更复杂。 我在原始SQL中使用的解决方案是(请参阅此处 ,获得导致我使用此解决方案的说明):
SELECT primary_id, primary_json, secondary_json, secondary_count
FROM
(
SELECT primary_id, secondary_count
FROM metadata
WHERE property='whatever I want'
-- Get the "best" 1000 results
ORDER BY secondary_count DESC
LIMIT 1000
) my_primary_ids
LEFT OUTER JOIN LATERAL
(
SELECT primary_json, seondary_json
FROM data
WHERE primary_id = my_primary_ids.primary_id
-- Only return 10 pieces of secondary json per primary json
LIMIT 10
) json_content ON true;
我已尽最大努力将其转换为sqlalchemy,但是仍然遇到问题,即结果查询在侧向FROM
查询的FROM
子句中重写了子查询。
例如,下面的sqlalchemy代码(假定与上面的表对象定义匹配)是部分解决方案。 我想我可以添加缺少的列(您将在生成的sql中看到):
from sqlalchemy import true
my_prim_ids_al = (
query(Metadata.primary_id.label('primary_id'),
Metadata.secondary_count.label('secondary_count'))
.filter_by(property='whatever I want')
.order_by(Metadata.secondary_count)
.limit(1000)
.from_self()
.subquery('my_primary_ids')
)
json_content_al = (
query(Data.primary_json.label('primary_json'),
Data.secondary_json.label('secondary_json'))
.filter_by(primary_id=my_primary_ids_al.c.primary_id)
.limit(10)
.from_self()
.subquery('json_content')
.lateral()
)
joined_query = (
my_primary_ids_al
.outerjoin(json_content_al, true())
.subquery('joined_query')
)
长形式的联合查询如下,具有上述荒谬的嵌套结构:
SELECT anon_1.primary_id, anon_1.secondary_count
FROM
(
SELECT metadata.primary_id AS primary_id,
metadata.secondary_count AS secondary_count
FROM metadata
WHERE metadata.property = 'whatever I want'
ORDER BY metadata.secondary_count DESC
LIMIT :param_1
) AS anon_1
LEFT OUTER JOIN LATERAL
(
SELECT anon_4.anon_3_secondary_json AS anon_3_secondary_json,
anon_4.anon_3_primary_json AS anon_3_primary_json,
FROM
(
SELECT anon_3.secondary_json AS anon_3_secondary_json,
anon_3.primary_json AS anon_3_primary_json,
FROM
(
SELECT data.secondary_json AS secondary_json,
data.primary_json AS primary_json,
FROM data
JOIN
(
SELECT anon_1.primary_id AS primary_id,
anon_1.secondary_count AS secondary_count
FROM
(
SELECT metadata.primary_id AS primary_id,
metadata.secondary_count AS secondary_count
FROM metadata
WHERE metadata.property = 'whatever I want'
ORDER BY metadata.secondary_count DESC
LIMIT :param_1
) AS anon_1
) AS primary_ids ON data.primary_id = primary_ides.primary_id
) AS anon_3
LIMIT :param_2) AS anon_4) AS anon_2 ON true
再次,我意识到这是一次不完整的尝试,因为并非所有列都在开始时被选择,但是关键问题是sqlalchemy在侧向联接子查询中创建了大量的嵌套查询 。 这是我无法解决的核心问题,除非得到解决,否则完成其余的查询毫无意义。
您不需要from_self()
和subquery()
,在这种情况下,前者会弄乱自相关并引起狂野的递归查询,因为编译器将对第一个子查询的引用视为第二个之内和之外。单独的实体。 只需删除对from_self()
的调用,查询就可以了。
发生的情况是,在调用from_self()
一个从先前Query
的SELECT语句中选择的新Query
。 应用subquery()
然后从中创建一个子查询,提供2级嵌套。 当然,该子查询必须在另一个查询中使用,因此至少会有3个嵌套级别。 当自相关失败并且子查询原样包含在第二个查询中时,您将获得深度嵌套的查询。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.