SQLAlchemy - combining group_by and func.count() with joinedload()

Question

I want to used joined eager loading (for speedup) in a query that filters based on relationships, then groups the results by the value of one column.

For example, here's a table:

PERSON | EMPLOYER | SALAD PREFERENCE
Alice  | bigCorp  | Ceasar
Bob    | evilCorp | Greek
Charlo | bigCorp  | Greek
Derek  | evilCorp | Caesar
...

And this has a relationship with the EMPLOYER table:

EMPLOYER | EVILNESS
bigCorp  | NOT_EVIL
evilCorp | SUPER_EVIL
...

And I want to the the counts of the salad preferences among people who work for super evil employers.

So I want to do something like:

salad_preferences_with_counts = {
    salad: count for (salad, count) in
    session.query(
        Person.salad_preference, func.count(Person.salad_preference))
        .group_by(Person.salad_preference)
        .filter(
            Person.employer.evilness == Employer.Evilness.super_evil
        )
        .options(load_only(Person.salad_preference))
        .options(joinedload(Person.employer).load_only(Employer.evilness))
        .all()
}

to generate:

{ "Caesar": 1, "Greek": 1 }

But, with the load_only and joinedload in the mix, it fails with:

sqlalchemy.exc.ArgumentError: Query has only expression-based entities, which do not apply to (column|relationship) property ("Person.salad_preference"|"Person.employer")

It looks like this happens for the same reason that with_entities() breaks joinedload() . Of course, when I take out load_only() and joinedload() , it all runs fine.

Is there a way for me to still use the same group_by and func.count logic to group by salad preference while still using joinedload() for performance optimization? Or is this impossible because it breaks the idea of joinedload() (that it's not supposed to affect query results), or maybe it's just not actually going to generate any performance benefit?

I'm very new to SQLAlchemy so I might be missing something fundamental here. If you have some "why are you even doing things this way" type of question, please let me know because it might also be the case that I'm going about this entirely wrong.

Thanks!

Answer 1

The load optimizations are just for what is fetched into the objects themselves. It will not optimize the filter() applied to the query. Also your query only asks for scalars and does not load the full object, Person , so there is nothing to optimize. Ie. this is just a 2-tuple of scalars Person.salad_preference, func.count(Person.salad_preference)

person = session.query(Person).options(load_only(Person.salad_preference)).first()
# Only this attribute was loaded, so we can access without triggering a query
person.salad_preference
# This attribute is not loaded, but will be loaded when we access it
person.employer
# We are only loading this value already, load_only does not make sense because there is no `person` object.
(person_salad_preference,) = session.query(Person.salad_preference).first()

I'm not sure what you are trying to do with this filter. Are you trying to filter over a join to Person.employer:

Person.employer.evilness == Employer.Evilness.super_evil

SQLAlchemy - combining group_by and func.count() with joinedload()

Question

1 answers

solution1
0 2021-07-20 04:28:50

SQLAlchemy - combining group_by and func.count() with joinedload()

Question

1 answers

solution1 0 2021-07-20 04:28:50

solution1
0 2021-07-20 04:28:50