简体   繁体   中英

sqlalchemy FULL OUTER JOIN

How to implement FULL OUTER JOIN in sqlalchemy on orm level.

Here my code:

q1 = (db.session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c')
    )
    .group_by(tb1.user_id)
)
q2 = (db.session.query(
        tb2.user_id.label('u_id'),
        func.count(tb2.id).label('tb2_c')
    )
    .group_by(tb2.user_id)
)

above two queries and I want to apply FULL OUTER JOIN on them.

Since 1.1. sqlalchemy now fully supports FULL OUTER JOINS. See here: https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.join.params.full

So for your code you would want to do:

q1 = (db.session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c')
    )
    .group_by(tb1.user_id)
).cte('q1')

q2 = (db.session.query(
        tb2.user_id.label('u_id'),
        func.count(tb2.id).label('tb2_c')
    )
    .group_by(tb2.user_id)
).cte('q2')

result = db.session.query(
    func.coalesce(q1.u_id, q2.u_id).label('u_id'),
    q1.tb1_c,
    q2.tb2_c
).join(
    q2,
    q1.u_id == q2.u_id,
    full=True
)

Note that as with any FULL OUTER JOIN, tb1_c and tb2_c may be null so you might want to apply a coalesce on them.

First of all, sqlalchemy does not support FULL JOIN out of the box, and for some good reasons. So any solution proposed will consist of two parts:

  1. a work-around for missing functionality
  2. sqlalchemy syntax to build a query for that work-around

Now, for the reasons to avoid the FULL JOIN , please read some old blog Better Alternatives to a FULL OUTER JOIN . From this very blog I will take the idea of how to avoid FULL JOIN by adding 0 values to the missing columns and aggregating ( SUM ) on UNION ALL intead. SA code might look something like below:

q1 = (session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c'),
        literal(0).label('tb2_c'), # @NOTE: added 0
      ).group_by(tb1.user_id))
q2 = (session.query(
        tb2.user_id.label('u_id'),
        literal(0).label('tb1_c'), # @NOTE: added 0
        func.count(tb2.id).label('tb2_c')
      ).group_by(tb2.user_id))

qt = union_all(q1, q2).alias("united")
qr = select([qt.c.u_id, func.sum(qt.c.tb1_c), func.sum(qt.c.tb2_c)]).group_by(qt.c.u_id)

Having composed the query above, I actually might consider other options:

  • simply execute those two queries separately and aggregate the results already in Python itself (for not so large results sets)
  • given that it looks like some kind of reporting functionality rather than business model workflow, create a SQL query and execute it directly via engine . (only if it really is much better performing though)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM