简体   繁体   中英

Django ORM limiting queryset to only return a subset of data

I have the following query in a Django app. The user field is a foreign key. The results may contain 1000 MyModel objects, but only for a handful of users. I'd like to limit it to 5 MyModel objects returned per user in the user__in= portion of the query. I should end up with 5*#users or less MyModel objects.

lfs = MyModel.objects.filter(
    user__in=[some,users,here,],
    active=True,
    follow=True,
)

Either through the ORM or SQL (using Postgres) would be acceptable.

Thanks

EDIT 2

Found a simpler way to get this done, which I've added as an answer below.

EDIT

Some of the links mentioned in the comments had some good information, although none really worked with Postgres or the Django ORM. For anyone else looking for this information in the future my adaptation of the code in those other questions/asnwers is here.

To implement this is postgres 9.1, I had to create a couple functions using pgperl (which also required me to install pgperl)

CREATE OR REPLACE FUNCTION set_int_var(name text, val bigint) RETURNS bigint AS $$
    if ($_SHARED{$_[0]} = $_[1]) {
        return $_[1];
    } else {
        return $_[1];
    }
$$ LANGUAGE plperl;

CREATE OR REPLACE FUNCTION get_int_var(name text) RETURNS bigint AS $$
    return $_SHARED{$_[0]};
$$ LANGUAGE plperl;

And my final query looks something like the following

SELECT x.id, x.ranking, x.active, x.follow, x.user_id
FROM (
    SELECT tbl.id, tbl.active, tbl.follow, tbl.user_id,
           CASE WHEN get_int_var('user_id') != tbl.user_id
THEN
    set_int_var('rownum', 1)
ELSE
    set_int_var('rownum', get_int_var('rownum') + 1)
END AS
    ranking,
set_int_var('user_id', tbl.user_id)
FROM my_table AS tbl
WHERE tbl.active = TRUE AND tbl.follow=TRUE
ORDER BY tbl.user_id
) AS x
WHERE x.ranking <= 5
ORDER BY x.user_id
LIMIT 50

The only downside to this is that if I try to limit the users that it looks for by using user_id IN (), the whole thing breaks and it just returns every row, rather than just 5 per user.

This is what ended up working, and allowed me to only select a handful of users, or all users (by removing the AND mt.user_id IN () line).

SELECT * FROM mytable
WHERE (id, user_id, follow, active) IN (
    SELECT id, likeable, user_id, follow, active FROM mytable mt
    WHERE mt.user_id = mytable.user_id
    AND mt.user_id IN (1, 2)
    ORDER BY user_id LIMIT 5)
ORDER BY likeable

I think this is what you where looking for (i didn't see it in other posts):

https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets

In other examples, they pass from queryset to list before "slicing". If you make something like this (for example):

    lfs = MyModel.objects.filter(
        user__in=[some,users,here,],
        active=True,
        follow=True,
    )[:10]

the resulting SQL it's a query with LIMIT 10 in it's clauses.

So, the query you are looking for would be something like this:

mymodel_ids = []
for user in users:
    mymodel_5ids_for_user = (MyModel.objects.filter(
        user=user,
        active=True,
        follow=True,
    )[:5]).values_list('id', flat=True)

    mymodel_ids.extend(mymodel_5ids_for_user)

lfs = MyModel.objects.filter(id__in=mymodel_ids)

having in lfs the objects of MyModel you where looking for (5 entries per user).

I think the number of queries is, at least, one per user and one to retrieve all MyModel objects with that filter.

Be aware of the order you want to filter the objects. If you change the order of "mymodel_5ids_for_user" query, the first 5 elements of the query could change.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM