[英]How to aggregate (min/max etc.) over Django JSONField data?
I'm using Django 1.9 with its built-in JSONField
and Postgres 9.4.我正在使用带有内置
JSONField
和 Postgres 9.4 的 Django 1.9。 In my model's attrs
json field I store objects with some values, including numbers.在我的模型的
attrs
json 字段中,我存储具有一些值的对象,包括数字。 And I need to aggregate over them to find min/max values.我需要汇总它们以找到最小/最大值。 Something like this:
像这样的东西:
Model.objects.aggregate(min=Min('attrs__my_key'))
Also, it would be useful to extract specific keys:此外,提取特定键也很有用:
Model.objects.values_list('attrs__my_key', flat=True)
The above queries fail with上述查询失败
FieldError: "Cannot resolve keyword 'my_key' into field. Join on 'attrs' not permitted."
FieldError:“无法将关键字'my_key'解析为字段。不允许加入'attrs'。”
Is it possible somehow?有可能吗?
Notes:笔记:
From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform
instead of RawSQL
.从 django 1.11(还没有发布,所以这可能会改变)你可以使用
django.contrib.postgres.fields.jsonb.KeyTextTransform
而不是RawSQL
。
In django 1.10 you have to copy/paste KeyTransform
to you own KeyTextTransform
and replace the ->
operator with ->>
and #>
with #>>
so it returns text instead of json objects.在 django 1.10 中,您必须将
KeyTransform
复制/粘贴到您自己的KeyTextTransform
,并将->
运算符替换为->>
并将#>
替换为#>>
以便它返回文本而不是 json 对象。
Model.objects.annotate(
val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')
You can even include KeyTextTransform
s in SearchVector
s for full text search您甚至可以在
SearchVector
中包含KeyTextTransform
以进行全文搜索
Model.objects.annotate(
search=SearchVector(
KeyTextTransform('jsonb_text_field_key', 'json_field'))
)
).filter(search='stuff I am searching for')
Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.请记住,您还可以在 jsonb 字段中建立索引,因此您应该根据您的特定工作负载考虑这一点。
For those who interested, I've found the solution (or workaround at least).对于那些感兴趣的人,我已经找到了解决方案(或至少解决方法)。
from django.db.models.expressions import RawSQL
Model.objects.annotate(
val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')
Note that attrs->>%s
expression will become smth like attrs->>'width'
after processing (I mean single quotes).请注意,
attrs->>%s
表达式在处理后将变得像attrs->>'width'
一样(我的意思是单引号)。 So if you hardcode this name you should remember to insert them or you will get error.所以如果你硬编码这个名字,你应该记住插入它们,否则你会出错。
/// A little bit offtopic /// /// 有点题外话 ///
And one more tricky issue not related to django itself but that is needed to be handled somehow.还有一个与 django 本身无关但需要以某种方式处理的棘手问题。 As
attrs
is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width
key.由于
attrs
是 json 字段,并且对其键和值没有限制,您可以(取决于您的应用程序逻辑)在例如width
键中获取一些非数字值。 In this case you will get DataError
from postgres as a result of executing the above query.在这种情况下,您将从 postgres 获得
DataError
作为执行上述查询的结果。 NULL values will be ignored meanwhile so it's ok. NULL 值将同时被忽略,所以没关系。 If you can just catch the error then no problem, you're lucky.
如果你能抓住错误,那么没问题,你很幸运。 In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.
在我的情况下,我需要忽略错误的值,这里唯一的方法是编写自定义 postgres 函数来抑制转换错误。
create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
return cast($1 as numeric);
exception
when invalid_text_representation then
return null;
end;
$$ language plpgsql immutable;
And then use it to cast text to numbers:然后使用它将文本转换为数字:
Model.objects.annotate(
val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')
Thus we get quite solid solution for such a dynamic thing as json.因此,对于像 json 这样的动态事物,我们得到了相当可靠的解决方案。
I know this is a bit late (several months) but I came across the post while trying to do this.我知道这有点晚了(几个月),但我在尝试这样做时遇到了这个帖子。 Managed to do it by:
设法做到这一点:
1) using KeyTextTransform to convert the jsonb value to text 1)使用KeyTextTransform将jsonb值转换为文本
2) using Cast to convert it to integer, so that the SUM works: 2)使用 Cast 将其转换为整数,以便 SUM 工作:
q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField())) \
.aggregate(Sum('numeric_val'))
print(q)
where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.其中“data”是 jsonb 属性,“numeric_val”是我通过注释创建的变量的名称。
Hope this helps somebody!希望这对某人有帮助!
It is possible to do this using a Postgres function可以使用 Postgres 函数来执行此操作
https://www.postgresql.org/docs/9.5/functions-json.html https://www.postgresql.org/docs/9.5/functions-json.html
from django.db.models import Func, F, FloatField
from django.db.models.expressions import Value
from django.db.models.functions import Cast
text = Func(F(json_field), Value(json_key), function='jsonb_extract_path_text')
floatfield = Cast(text, FloatField())
Model.objects.aggregate(min=Min(floatfield))
This is much better than using the RawQuery
because it doesn't break if you do a more complex query, where Django uses aliases and where there are field name collisions.这比使用
RawQuery
要好得多,因为如果您执行更复杂的查询,它不会中断,其中 Django 使用别名并且存在字段名称冲突。 There is so much going on with the ORM that can bite you with hand written implementations. ORM 有很多事情可以用手写的实现来咬你。
Since Django 3.1 the KeyTextTransform
function on a JSON field works for all database backends .从 Django 3.1 开始,JSON 字段上的
KeyTextTransform
函数适用于所有数据库后端。 It maps to the ->>
operator in Postgres .它映射到Postgres 中的
->>
运算符。
It can be used to annotate a specific JSON value inside a JSONField
on the queryset results before you aggregate it.在聚合之前,它可用于在查询集结果的
JSONField
内注释特定 JSON 值。 A more clear example how to utilize this:一个更清楚的例子如何利用它:
First we need to annotate the key you want to aggregate.首先,我们需要注释要聚合的键。 So if you have a Django model with a
JSONField
named data
and the JSON containing looks like this:因此,如果您有一个带有名为
data
的JSONField
的 Django 模型,并且包含的 JSON 如下所示:
{
"age": 43,
"name" "John"
}
You would annotate the queryset as following:您可以将查询集注释如下:
from django.db.models import IntegerField
from django.db.models.fields.json import KeyTextTransform
qs = Model.objects.annotate(
age=Cast(
KeyTextTransform("age", "data"), models.IntegerField()
)
The Cast
is needed to stay compatible with all database backend. Cast
需要与所有数据库后端保持兼容。
Now you can aggregate to your liking:现在您可以根据自己的喜好聚合:
from django.db.models import Min, Max, Avg, IntegerField
from django.db.models.functions import Cast, Round
qs.aggregate(
min_age=Round(Min("age")),
max_age=Round(Max("age")),
avg_age=Cast(Round(Avg("age")), IntegerField()),
)
>>> {'min_age': 25, 'max_age' 82:, 'avg_age': 33}
Seems there is no native way to do it.似乎没有本地方法可以做到这一点。
I worked around like this:我是这样工作的:
my_queryset = Product.objects.all() # Or .filter()...
max_val = max(o.my_json_field.get(my_attrib, '') for o in my_queryset)
This is far from being marvelous, since it is done at the Python Level (and not at the SQL level).这远非奇妙,因为它是在 Python 级别完成的(而不是在 SQL 级别)。
from django.db.models.functions import Cast
from django.db.models import Max, Min
qs = Model.objects.annotate(
val=Cast('attrs__key', FloatField())
).aggregate(
min=Min("val"),
max=Max("val")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.