SQL：如何获取具有唯一列值的记录并对另一列中的值求和

Question

I have this table called file我有一张叫做file表

id         integer primary key,
created_on timestamp
updated_on timestamp 
file_name  text not null
path       text not null unique
hash       text not null
size       bigint not null
size_mb    bigint not null

I want to get all the records with a unique hash value (that is a single instance of duplicated files) and then sum the values in the size column to the total bytes of disk space I'll need to back up a single copy of each file.我想获取具有唯一hash值的所有记录（即重复文件的单个实例），然后将size列中的值与磁盘空间的总字节数相加，我需要备份每个记录的单个副本文件。

Answer 1

This returns only unique hashes, ie no duplicates exist:这仅返回唯一的哈希值，即不存在重复项：

select *, 
   -- group sum of all files
   sum(size) over () 
from
 (
   select *, 
      -- rows per hash
      count(*) over (partition by hash) as cnt
   from file
 ) as dt
where cnt = 1

Edit: This return only one row per hash:编辑：这每个哈希只返回一行：

select *, 
   -- group sum of all files
   sum(size) over () 
from
 (
   select *, 
      -- unique number per hash
      row_number(*) over (partition by hash order by hash) as rn
   from file
 ) as dt
where rn = 1

Both queries are Standard SQL, but PostgreSQL also supports proprietary syntax:这两个查询都是标准 SQL，但 PostgreSQL 也支持专有语法：

select *, 
   -- group sum of all files
   sum(size) over () 
from
 (
   select DISTINCT ON (hash) *
   from file
   order by hash
 ) as dt

SQL：如何获取具有唯一列值的记录并对另一列中的值求和

问题描述

1 个解决方案

解决方案1
1 2020-03-05 18:48:42

SQL：如何获取具有唯一列值的记录并对另一列中的值求和

问题描述

1 个解决方案

解决方案1 1 2020-03-05 18:48:42

解决方案1
1 2020-03-05 18:48:42