简体   繁体   English

如何将子查询的结果解包到列表类型字段中以将原始查询的结果打包为peewee?

[英]How to unpack result of sub-query into list-type field to result of original query in peewee?

How to make peewee put ids of related table rows into additional list-like field into resulting query? 如何使peewee将相关表行的ID放入结果类似的其他类似列表的字段中?

I want to make duplicates detecting manager for media files. 我想重复检测媒体文件的管理器。 For each file on my PC I have record in database with fields like 对于我PC上的每个文件,我都在数据库中记录了如下字段

File name, Size, Path, SHA3-512, Perceptual hash, Tags, Comment, Date added, Date changed, etc...

Depending on the situation I want to use different patterns to be used to consider records in table as duplicates. 根据情况,我想使用不同的模式将表中的记录视为重复记录。

In the most simple case I want just to see all records having the same hash, so I 在最简单的情况下,我只想查看所有具有相同哈希值的记录,所以我

subq = Record.select(Record.SHA).group_by(Record.SHA).having(peewee.fn.Count() > 1)
subq = subq.alias('jq')
q = Record.select().join(q, on=(Record.SHA == q.c.SHA)).order_by(Record.SHA)
for r in q:
    process_record_in_some_way(r)

and everything is fine. 一切都很好。 But there are lot of cases when I want to use different sets of table columns as grouping patterns. 但是在很多情况下,我想使用不同的表列集作为分组模式。 So in the worst case I use all of them except id and "Date added" column to detect exact duplicating rows in database, when I just readded the same file for few times which leads to the monster like 因此,在最坏的情况下,当我几次读取同一文件时,我会使用除id和“添加日期”列以外的所有字符来检测数据库中确切重复的行,这会导致像

subq = Record.select(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).group_by(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).having(peewee.fn.Count() > 1)
subq = subq.alias('jq')
q = Record.select().join(q, on=(Record.SHA == q.c.SHA and Record.Name == q.c.Name and Record.Date == q.c.Date and Record.Size == q.c.Size and Record.Tags == q.c.Tags)).order_by(Record.SHA)
for r in q:
    process_record_in_some_way(r)

and this is not the full list of my fields, just example. 这不是我的字段的完整列表,仅是示例。 Same thing I have to do for other patterns of sets of fields, ie duplicating it's list 3 times in select clause, grouping clause of subquery and then listing them all again in joining clause. 对于其他类型的字段集,我必须做同样的事情,即在select子句,subquery的grouping子句中将其列表复制3次,然后在join子句中再次列出它们。

I wish I could just group the records with appropriate pattern and peewee would just list ids of all the members of each group into new list field like 我希望我可以按适当的模式对记录进行分组,而peewee只需将每个组的所有成员的ID列出到新的列表字段中,例如

q=Record.select(Record, SOME_MAJIC.alias('duplicates')).group_by(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).having(peewee.fn.Count() > 1).SOME_ANOTHER_MAJIC
for r in q:
    process_group_of_records(r) # r.duplicates == [23, 44, 45, 56, 100], for example

How can I do this? 我怎样才能做到这一点? Listing the same parameters trice I really feel like I do something wrong. 列出相同的参数三次,我真的觉得我做错了什么。

You can use GROUP_CONCAT (or for postgres, array_agg) to group and concatenate a list of ids/filenames, whatever. 您可以使用GROUP_CONCAT(或对于postgres,请使用array_agg)对ID /文件名列表进行分组和连接。

So for files with the same hash: 因此,对于具有相同哈希值的文件:

query = (Record
         .select(Record.sha, fn.GROUP_CONCAT(Record.id).alias('id_list'))
         .group_by(Record.sha)
         .having(fn.COUNT(Record.id) > 1))

This is a relational database. 这是一个关系数据库。 So you're dealing all the time, everywhere, with tables consisting of rows and columns. 因此,您一直在无处不在处理由行和列组成的表。 There's no "nesting". 没有“嵌套”。 GROUP_CONCAT is about as close as you can get. GROUP_CONCAT尽可能接近。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM