sqlalchemy FULL OUTER JOIN

Question

How to implement FULL OUTER JOIN in sqlalchemy on orm level.

Here my code:

q1 = (db.session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c')
    )
    .group_by(tb1.user_id)
)
q2 = (db.session.query(
        tb2.user_id.label('u_id'),
        func.count(tb2.id).label('tb2_c')
    )
    .group_by(tb2.user_id)
)

above two queries and I want to apply FULL OUTER JOIN on them.

Answer 1

Since 1.1. sqlalchemy now fully supports FULL OUTER JOINS. See here: https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.join.params.full

So for your code you would want to do:

q1 = (db.session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c')
    )
    .group_by(tb1.user_id)
).cte('q1')

q2 = (db.session.query(
        tb2.user_id.label('u_id'),
        func.count(tb2.id).label('tb2_c')
    )
    .group_by(tb2.user_id)
).cte('q2')

result = db.session.query(
    func.coalesce(q1.u_id, q2.u_id).label('u_id'),
    q1.tb1_c,
    q2.tb2_c
).join(
    q2,
    q1.u_id == q2.u_id,
    full=True
)

Note that as with any FULL OUTER JOIN, tb1_c and tb2_c may be null so you might want to apply a coalesce on them.

Answer 2

First of all, sqlalchemy does not support FULL JOIN out of the box, and for some good reasons. So any solution proposed will consist of two parts:

a work-around for missing functionality
sqlalchemy syntax to build a query for that work-around

Now, for the reasons to avoid the FULL JOIN , please read some old blog Better Alternatives to a FULL OUTER JOIN . From this very blog I will take the idea of how to avoid FULL JOIN by adding 0 values to the missing columns and aggregating ( SUM ) on UNION ALL intead. SA code might look something like below:

q1 = (session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c'),
        literal(0).label('tb2_c'), # @NOTE: added 0
      ).group_by(tb1.user_id))
q2 = (session.query(
        tb2.user_id.label('u_id'),
        literal(0).label('tb1_c'), # @NOTE: added 0
        func.count(tb2.id).label('tb2_c')
      ).group_by(tb2.user_id))

qt = union_all(q1, q2).alias("united")
qr = select([qt.c.u_id, func.sum(qt.c.tb1_c), func.sum(qt.c.tb2_c)]).group_by(qt.c.u_id)

Having composed the query above, I actually might consider other options:

simply execute those two queries separately and aggregate the results already in Python itself (for not so large results sets)
given that it looks like some kind of reporting functionality rather than business model workflow, create a SQL query and execute it directly via engine . (only if it really is much better performing though)

sqlalchemy FULL OUTER JOIN

Question

2 answers

solution1
8 2020-05-24 01:12:50

solution2
6 2013-12-04 22:13:23

sqlalchemy FULL OUTER JOIN

Question

2 answers

solution1 8 2020-05-24 01:12:50

solution2 6 2013-12-04 22:13:23

solution1
8 2020-05-24 01:12:50

solution2
6 2013-12-04 22:13:23