简体   繁体   English

如何在 Django JSONField 数据上聚合(最小/最大等)?

[英]How to aggregate (min/max etc.) over Django JSONField data?

I'm using Django 1.9 with its built-in JSONField and Postgres 9.4.我正在使用带有内置JSONField和 Postgres 9.4 的 Django 1.9。 In my model's attrs json field I store objects with some values, including numbers.在我的模型的attrs json 字段中,我存储具有一些值的对象,包括数字。 And I need to aggregate over them to find min/max values.我需要汇总它们以找到最小/最大值。 Something like this:像这样的东西:

Model.objects.aggregate(min=Min('attrs__my_key'))

Also, it would be useful to extract specific keys:此外,提取特定键也很有用:

Model.objects.values_list('attrs__my_key', flat=True)

The above queries fail with上述查询失败

FieldError: "Cannot resolve keyword 'my_key' into field. Join on 'attrs' not permitted." FieldError:“无法将关键字'my_key'解析为字段。不允许加入'attrs'。”

Is it possible somehow?有可能吗?

Notes:笔记:

  1. I know how to make a plain Postgres query to do the job, but am searching specifically for an ORM solution to have the ability to filter etc.我知道如何制作一个简单的 Postgres 查询来完成这项工作,但我正在专门寻找一个 ORM 解决方案以具有过滤等功能。
  2. I suppose this can be done with a (relatively) new query expressions/lookups API, but I haven't studied it yet.我想这可以通过(相对)新的查询表达式/查找 API 来完成,但我还没有研究过。

From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform instead of RawSQL .从 django 1.11(还没有发布,所以这可能会改变)你可以使用django.contrib.postgres.fields.jsonb.KeyTextTransform而不是RawSQL

In django 1.10 you have to copy/paste KeyTransform to you own KeyTextTransform and replace the -> operator with ->> and #> with #>> so it returns text instead of json objects.在 django 1.10 中,您必须将KeyTransform复制/粘贴到您自己的KeyTextTransform ,并将->运算符替换为->>并将#>替换为#>>以便它返回文本而不是 json 对象。

Model.objects.annotate(
    val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')

You can even include KeyTextTransform s in SearchVector s for full text search您甚至可以在SearchVector中包含KeyTextTransform以进行全文搜索

Model.objects.annotate(
    search=SearchVector(
        KeyTextTransform('jsonb_text_field_key', 'json_field'))
    )
).filter(search='stuff I am searching for')

Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.请记住,您还可以在 jsonb 字段中建立索引,因此您应该根据您的特定工作负载考虑这一点。

For those who interested, I've found the solution (or workaround at least).对于那些感兴趣的人,我已经找到了解决方案(或至少解决方法)。

from django.db.models.expressions import RawSQL

Model.objects.annotate(
    val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')

Note that attrs->>%s expression will become smth like attrs->>'width' after processing (I mean single quotes).请注意, attrs->>%s表达式在处理后将变得像attrs->>'width'一样(我的意思是单引号)。 So if you hardcode this name you should remember to insert them or you will get error.所以如果你硬编码这个名字,你应该记住插入它们,否则你会出错。

/// A little bit offtopic /// /// 有点题外话 ///

And one more tricky issue not related to django itself but that is needed to be handled somehow.还有一个与 django 本身无关但需要以某种方式处理的棘手问题。 As attrs is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width key.由于attrs是 json 字段,并且对其键和值没有限制,您可以(取决于您的应用程序逻辑)在例如width键中获取一些非数字值。 In this case you will get DataError from postgres as a result of executing the above query.在这种情况下,您将从 postgres 获得DataError作为执行上述查询的结果。 NULL values will be ignored meanwhile so it's ok. NULL 值将同时被忽略,所以没关系。 If you can just catch the error then no problem, you're lucky.如果你能抓住错误,那么没问题,你很幸运。 In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.在我的情况下,我需要忽略错误的值,这里唯一的方法是编写自定义 postgres 函数来抑制转换错误。

create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
    return cast($1 as numeric);
exception
    when invalid_text_representation then
        return null;
end;
$$ language plpgsql immutable;

And then use it to cast text to numbers:然后使用它将文本转换为数字:

Model.objects.annotate(
    val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')

Thus we get quite solid solution for such a dynamic thing as json.因此,对于像 json 这样的动态事物,我们得到了相当可靠的解决方案。

I know this is a bit late (several months) but I came across the post while trying to do this.我知道这有点晚了(几个月),但我在尝试这样做时遇到了这个帖子。 Managed to do it by:设法做到这一点:

1) using KeyTextTransform to convert the jsonb value to text 1)使用KeyTextTransform将jsonb值转换为文本

2) using Cast to convert it to integer, so that the SUM works: 2)使用 Cast 将其转换为整数,以便 SUM 工作:

q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField()))  \
.aggregate(Sum('numeric_val'))

print(q)

where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.其中“data”是 jsonb 属性,“numeric_val”是我通过注释创建的变量的名称。

Hope this helps somebody!希望这对某人有帮助!

It is possible to do this using a Postgres function可以使用 Postgres 函数来执行此操作

https://www.postgresql.org/docs/9.5/functions-json.html https://www.postgresql.org/docs/9.5/functions-json.html

from django.db.models import Func, F, FloatField
from django.db.models.expressions import Value
from django.db.models.functions import Cast

text = Func(F(json_field), Value(json_key), function='jsonb_extract_path_text')
floatfield = Cast(text, FloatField())

Model.objects.aggregate(min=Min(floatfield))

This is much better than using the RawQuery because it doesn't break if you do a more complex query, where Django uses aliases and where there are field name collisions.这比使用RawQuery要好得多,因为如果您执行更复杂的查询,它不会中断,其中 Django 使用别名并且存在字段名称冲突。 There is so much going on with the ORM that can bite you with hand written implementations. ORM 有很多事情可以用手写的实现来咬你。

Since Django 3.1 the KeyTextTransform function on a JSON field works for all database backends .从 Django 3.1 开始,JSON 字段上的KeyTextTransform函数适用于所有数据库后端 It maps to the ->> operator in Postgres .它映射到Postgres 中的->>运算符

It can be used to annotate a specific JSON value inside a JSONField on the queryset results before you aggregate it.在聚合之前,它可用于在查询集结果的JSONField内注释特定 JSON 值。 A more clear example how to utilize this:一个更清楚的例子如何利用它:

First we need to annotate the key you want to aggregate.首先,我们需要注释要聚合的键。 So if you have a Django model with a JSONField named data and the JSON containing looks like this:因此,如果您有一个带有名为dataJSONField的 Django 模型,并且包含的​​ JSON 如下所示:

{
    "age": 43,
    "name" "John"
}

You would annotate the queryset as following:您可以将查询集注释如下:

from django.db.models import IntegerField
from django.db.models.fields.json import KeyTextTransform

qs = Model.objects.annotate(
    age=Cast(
        KeyTextTransform("age", "data"), models.IntegerField()
    )

The Cast is needed to stay compatible with all database backend. Cast需要与所有数据库后端保持兼容。

Now you can aggregate to your liking:现在您可以根据自己的喜好聚合:

from django.db.models import Min, Max, Avg, IntegerField
from django.db.models.functions import Cast, Round

qs.aggregate(
    min_age=Round(Min("age")),
    max_age=Round(Max("age")),
    avg_age=Cast(Round(Avg("age")), IntegerField()),
)

>>> {'min_age': 25, 'max_age' 82:, 'avg_age': 33}

Seems there is no native way to do it.似乎没有本地方法可以做到这一点。

I worked around like this:我是这样工作的:

my_queryset = Product.objects.all() # Or .filter()...
max_val = max(o.my_json_field.get(my_attrib, '') for o in my_queryset)

This is far from being marvelous, since it is done at the Python Level (and not at the SQL level).这远非奇妙,因为它是在 Python 级别完成的(而不是在 SQL 级别)。

from django.db.models.functions import Cast
from django.db.models import Max, Min

qs = Model.objects.annotate(
    val=Cast('attrs__key', FloatField())
).aggregate(
    min=Min("val"),
    max=Max("val")
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM